Now online: 'Guidelines for Statistical Disclosure Control Methods Applied on Geo-Referenced Data'

A guidelines document I co-authored together with colleagues from France, Austria, Poland, and the Netherlands is now available online. These Guidelines for Statistical Disclosure Control Methods Applied on Geo-Referenced Data are the result of extensive work under the STACE project (Statistical Methods and Tools for Time Series, Seasonal Adjustment and Statistical Disclosure Control), together with my co-authors Julien Jamme, Edwin de Jonge, Andrzej Mlodak, Johannes Gussenbauer and Peter-Paul de Wolf.

The document treats statistical methods to protect subject confidentiality when aggregates for small spatial areas are to be published. It extends and supplements the Handbook on Statistical Disclosure Control (currently in its new, second edition).

Quoting from the Introduction:

"Users of statistical data are often interested in spatial distribution patterns. A policy-maker may be interested in the distribution of income over the neighborhoods of a city, a health care professional may want to know where to find incidences of infections and a social worker may want to focus on locations that are at high risk for social problems. While these are all examples of valid and relevant needs, they are at odds with confidentiality. When a location is too detailed, e.g. an address, or when its neighborhood has few inhabitants, e.g. one isolated household, the displayed information at that location is very disclosive. Thus, effective disclosure control methods to protect publications that involve spatial information are needed. [...] The current guidelines describe some measures to specify the disclosure risks involved with publications including a spatial dimension, statistical disclosure control methods to reduce the risk of disclosure while maintaining (some of) the utility, and ways to assess the amount of utility / information lost in the process." (Möhler et al., 2024, p.4, emphasis added)

From the content

  • A thorough introduction to the conceptual specifics of statistical disclosure control (SDC) in geospatial settings. What role does the size of the geographical reference unit play? How is the potential to disclose sensitive information related to something called a 'sticky' population?
  • In-depth treatment of the problem of geographical differencing, a classical challenge for National Statistical Institutes (NSI). Suggestions for its resolution.
  • Descriptions of protection methods, including those based on quadtree structures, on bivariate kernel smoothing, on random swapping and on noise addition.
  • Software and tools for privacy practitioners, including suggestions for improved error mapping and a comparison of distributional distance measures to assess changes in spatial pattern.
  • Case study (for the hands-on types) on creating safe gridded population data using the Cell Key Method (CKM), supplementing and building on a previous conference project.

Additional information

Like a previous paper, these guidelines are a result of Eurostat's Centre of Excellence on Statistical Disclosure Control under the Collaboration in Research and Methodology for Official Statistics (CROS). The work was co-funded by the European Commission (grant agreement 899218, 2019-BG-Methodology).

The guidelines are distributed under CC BY-SA 4.0 international license, meaning they are free to share, show and build on. The case study part comes with R code that allows readers ro reproduce the results and incorporate some of the functionality in their own data confidentiality projects. The code is available from an accompanying GitHub repo.

UPDATE (June 1st, 2025)

The guidelines are now out as an official Eurostat publication. See also this post.

Literature

M. Möhler, J. Jamme, E. de Jonge, A. Mlodak, J. Gussenbauer, P.-P. de Wolf, "Guidelines for Statistical Disclosure Control Methods Applied on Geo-Referenced Data," WP2 of STACE project - Grant agreement 899218-2019-BG-Methogology, Task 2.4, Deliverable D2.9, 2024. Link

Kommentare

Beliebte Posts aus diesem Blog

On the reversibility of Voronoi geomasking

Herfindahl-Hirschman-Index als Maß für die Diversität von Herkünften auf Gemeindeebene [deutsch]

Derivation of the expected nearest neighbor distance in a homogeneous Poisson process