CART synthesis of small-scale georeferences: an experiment using AMELIA data
Inspired by Drechsler & Hu (2021) a method for the creation of synthetic geodata based on classification and regression trees (CART) is tested. Background A common problem faced in the provision of microdata for public use are potential privacy violations. The problem is exacerbated if the data contains detailed geographic information, which is known to be particularly revealing (e.g. VanWey et al., 2005). One approach is to publish synthetic microdata instead (Drechsler, 2011), which is built from a model trained on the original data in order to keep important relationships intact. Classification and regression trees have been suggested for the task by Reiter (2005) and have subsequently shown promising results. Suppe we want to synthesize variable $X_p$. We use the CART to model in a nonparametric fashion the conditional distribution $f(X_p | \mathbf{X}_{-p})$ where $\mathbf{X}_{-p}$ is the matrix of predictor variables without the $p$th one. To make sure we have sufficient var...