Deriving the distribution of coordinate errors in displacement processes with minimum displacement distance

When dealing with data on sensitive subjects, one often encounters artificial measurement error introduced to protect confidentiality. In the field of geomasking specifically, survey units are geo-located and their coordinates randomly displaced. Measurement error models are useful to tackle this issue in an analysis, but those typically require analytical expressions for the distribution of coordinate errors for a given displacement mechanism.

In a somewhat recent PhD thesis, Hossain (2023) has derived an analytical expression for the error distribution of the most basic geomask, the circular uniform random displacement mask. In this post, I build on his work to derive an analogous formula for the more recent so-called donut mask.

Background

The most straightforward geomask is the random displacement geomask, which has been treated in several previous posts. It draws a random angle and a random distance, both from a uniform distribution, then calculates the $x$- and $y$-offsets by basic trigonometry.

Donut masking was introduced by Hampton et al. (2010) and derives its current-day relevance from its implementation in online geomasking service MaskMy.XYZ (see also Swanlund et al., 2020). Its most distinguishing feature is that it adds a minimum displacement distance to the computation, preventing sensitive locations from being moved too little. Specifically, a true coordinate $(x, y)$ is masked with coordinate errors $(e_x, e_y)$ according to the following scheme: \[\begin{aligned} (x', y') &= (x + e_x, y + e_y) \text{ with }\\e_y &= \sin(\alpha) \cdot d \; , \; e_x = \cos(\alpha) \cdot d \\ \alpha &\sim \mathcal{U}(0, 2\pi) \;,\; d \sim \mathcal{U}(\delta_{\min}, \delta_{\max}) \end{aligned}\] with $\delta_{\min}, \delta_{\max}$ the minimum and maximum displacement distances respectively. We are looking for the density function $f_{e_y, e_x}$, which could then be used to derive the conditional distribution $f_{x, y | x', y'}$ for modeling the donut masking error in a specific application. It is well known that the error distribution from donut masking is not just the uniform distribution over a ring-shaped area. Rather, there is a concentration of probability mass in the center (see figures below). Finding $f_{e_y, e_x}$ therefore requires a few more steps.

Derivation

We begin by rewriting the random components as $\alpha = 2 \pi v_1$ and $d = \delta_{\min} + \delta v_2$ with $\delta := \delta_{\max} - \delta_{\min}$ and $v_1, v_2 \overset{iid}{\sim} \mathcal{U}(0,1)$. This reduces the randomness to the joint distribution of two variables, each on the unit interval: $v_1$ scales the angle between $(0, 2\pi]$ and $v_2$ scales the displacement distance between $(\delta_{\min}, \delta_{\max}]$. Since both are independently drawn by design, this gives the neat and simple joint density $f_{v_1, v_2}(v_1, v_2) = 1$ for $0 < v_1 \leq 1, 0 < v_2 \leq 1$. With this, we aim to derive $f_{e_y, e_x}(e_y, e_x)$ by applying the following transformations: \[\begin{aligned} e_y &= g_1(v_1, v_2) = \sin(2 \pi v_1) \cdot (\delta_{\min} + \delta v_2),\\ e_x &= g_1(v_1, v_2) = \cos(2 \pi v_1) \cdot (\delta_{\min} + \delta v_2). \end{aligned}\] The image below plots simulated results, convincing us that this is indeed our required 'donut' shape.

Visualization of the transformation $e_y = g_1(v_1, v_2)$, $e_x = g_2(v_1, v_2)$, with $10^4$ sample points using $\delta_{\min} = 500$, $\delta = 2000$; cf. Hossain (2023), p.37 for the case without minimum displacement.

We get the desired density function from the known one as follows (see e.g. Rohatgi, 1976): \[\begin{aligned} f_{e_y, e_x}(e_y, e_x) &= f_{v_1, v_2}\left(g_1^{-1}(e_y, e_x), g_2^{-1}(e_y, e_x)\right) \cdot |J(g_1^{-1}, g_2^{-1})|\\ &= |J(g_1^{-1}, g_2^{-1})| \end{aligned}\] where $|J(g_1^{-1}, g_2^{-1})|$ is the determinant of the Jacobi matrix $J = \partial (g_1^{-1}, g_2^{-1}) / \partial (e_y, e_x)$. The first part in the transformation neatly resolves due to $f_{v_1, v_2}(v_1, v_2) = 1$. The inverse transformations $g_1^{-1}$ and $g_2^{-1}$ follow closely Hossain (2023), p.38 and involve the two basic trigonometric identities $\frac{\sin(a)}{\cos(a)}= \tan(a)$ as well as $\sin^2(a) + \cos^2(a) = 1$: \[\begin{aligned}\frac{e_y}{e_x} &= \frac{\sin(2 \pi v_1) \cdot (\delta_{\min} + \delta v_2)}{\cos(2 \pi v_1) \cdot (\delta_{\min} + \delta v_2)} = \tan(2 \pi v_1)\\ \Leftrightarrow v_1 &= \frac{1}{2\pi} \cdot \arctan\left(\frac{e_y}{e_x}\right) = g_1^{-1}(e_y, e_x), \\ e_y^2 + e_x^2 &= \sin^2(2 \pi v_1) \cdot (\delta_{\min} + \delta v_2)^2 + \cos^2(2 \pi v_1) \cdot (\delta_{\min} + \delta v_2)^2\\ &= (\delta_{\min} + \delta v_2)^2 \\ \Leftrightarrow v_2 &= \frac{\sqrt{e_y^2 + e_x^2} - \delta_{\min}}{\delta} = g_2^{-1}(e_y, e_x). \end{aligned}\] Not surprisingly, the transformation on the angle variable $v_1$ comes out the same as in Hossain (2023), since it does not depend on the minimum displacement. The second variable $v_2$ shows two modifications compared to the case without minimum: The constant $\delta_{\min}$ now occurs and $\delta$ is not the maximum displacement, but the range of effective displacement, as defined above. For $J$ we next need the partial derivatives: \[\begin{aligned} \frac{\partial g_1^{-1}}{\partial e_y} &= \frac{\partial \frac{1}{2\pi} \arctan\left(\frac{e_y}{e_x}\right)}{\partial e_y} = \frac{1}{2 \pi} \cdot \frac{1}{1 + \left(\frac{e_y}{e_x}\right)^2} \cdot \frac{1}{e_x} = \frac{1}{2\pi} \cdot \frac{e_x}{e_y^2 + e_x^2},\\
\frac{\partial g_1^{-1}}{\partial e_x} &= \frac{\partial \frac{1}{2\pi} \arctan\left(\frac{e_y}{e_x}\right)}{\partial e_x} = \frac{1}{2 \pi} \cdot \frac{1}{1 + \left(\frac{e_y}{e_x}\right)^2} \cdot -\frac{e_y}{e_x^2} = -\frac{1}{2\pi} \cdot \frac{e_y}{e_y^2 + e_x^2},\\
\frac{\partial g_2^{-1}}{\partial e_y} &= \frac{\partial \frac{1}{\delta} \sqrt{e_y^2 + e_x^2} - \frac{\delta_{\min}}{\delta}}{\partial e_y} = \frac{1}{\delta} \cdot \frac{1}{2} \cdot \frac{1}{\sqrt{e_y^2 + e_x^2}} \cdot 2e_y = \frac{e_y}{\delta \sqrt{e_y^2 + e_x^2}},\\
\frac{\partial g_2^{-1}}{\partial e_x} &= \frac{\partial \frac{1}{\delta} \sqrt{e_y^2 + e_x^2} - \frac{\delta_{\min}}{\delta}}{\partial e_x} = \frac{1}{\delta} \cdot \frac{1}{2} \cdot \frac{1}{\sqrt{e_y^2 + e_x^2}} \cdot 2e_x = \frac{e_x}{\delta \sqrt{e_y^2 + e_x^2}}. \end{aligned}\] This gives us our $J(g_1^{-1}, g_2^{-1})$. For the determinant we then have \[\begin{aligned} |J(g_1^{-1}, g_2^{-1})| &= \begin{vmatrix} \frac{\partial g_1^{-1}}{\partial e_y} & \frac{\partial g_1^{-1}}{\partial e_x} \\ \frac{\partial g_2^{-1}}{\partial e_y} & \frac{\partial g_2^{-1}}{\partial e_x}\\ \end{vmatrix}\\ &= \frac{1}{2 \pi} \cdot \frac{e_x}{e_y^2 + e_x^2} \cdot \frac{e_x}{\delta \sqrt{e_y^2 + e_x^2}} - \frac{1}{2 \pi} \cdot -\frac{e_y}{e_y^2 + e_x^2} \cdot \frac{e_y}{\delta \sqrt{e_y^2 + e_x^2}}\\ &= \frac{1}{2\pi \delta \sqrt{e_y^2 + e_x^2}} \cdot \frac{e_x^2 + e_y^2}{e_y^2 + e_x^2} = \frac{1}{\delta} \cdot \frac{1}{2 \pi \sqrt{e_y^2 + e_x^2}}.\end{aligned}\] After re-substituting $\delta = \delta_{\max} - \delta_{\min}$ we get our joint density function for the coordinate errors: \[f_{e_y, e_x}(e_y, e_x) = \begin{cases} \frac{1}{\delta_{\max} - \delta_{\min}} \cdot \frac{1}{2 \pi \sqrt{e_y^2 + e_x^2}} & \text{ for } \delta_{\min} < \sqrt{e_y^2 + e_x^2} \leq \delta_{\max} \\ 0 & \text{ else}.\end{cases}\] When we put $\delta_{\min} = 0$, we get again the density derived by Hossain (2023), p.38 (case without a minimum displacement distance). Below, I include a plot of $f_{e_y, e_x}$. It can clearly be seen that there is a concentration of probability mass around the inner radius, which distinguishes distributions in the context of geomasking from established disk point picking problems.

Perspective image of $f_{e_y, e_x}$

UPDATE 2026-07-16

In the meantime, the derivation in this post has been independently published by Gril et al. (2026) (in the linked paper it is shown in Appendix A).

Literature

L. Gril, J. Hossain, N. Tzavidis, U. Rendtel, "Kernel density estimation under masking of geolocations with applications to DHS data," FU Berlin School of Business & Economics Discussion Paper 2026/3, 2026.

K.H. Hampton, M.K. Fitch, W.B. Allshouse, I.A. Doherty, D.C. Gesink, P.A. Leone, M.L. Serre, W.C. Miller, "Mapping health data: Improved privacy protection with donut method geomasking," American Journal of Epidemiology, vol.172, no.9, pp.1062-1069, 2010.

J. Hossain, Statistical Estimation and Inference with Aggregated and Displaced Georeferenced Data, PhD Thesis - University of Southampton, 2023, online: https://eprints.soton.ac.uk/484015/.

V.K. Rohatgi, An Introduction to Probability Theory and Mathematical Statistics, John Wiley & Sons, 1976.

D. Swanlund, N. Schuurman, M. Brussoni, "MaskMy.XYZ: An easy-to-use tool for protecting geoprivacy using geographic masks," Transactions in GIS, vol.24, no.2, pp.390-401, 2020.

Dieses Blog durchsuchen

statshorts

Deriving the distribution of coordinate errors in displacement processes with minimum displacement distance

Kommentare

Kommentar veröffentlichen

Beliebte Posts aus diesem Blog

Derivation of the expected nearest neighbor distance in a homogeneous Poisson process

Consistent random sample queries using cell keys

Support recovery for kernel density estimation - easy in theory, impossible in practice?