Consistent random sample queries using cell keys
Random sample querying is a method to ensure privacy for personal information when answering statistical queries. Rather than calculating the query response from all records in the database, a random sample is drawn and an estimate, based on sampling theory , is returned. The contribution of any individual data point to the answer thereby becomes uncertain, protecting the privacy of data subjects. The method has an undeniable charme, because sampling has intuitive and verifiable privacy-enhancing properties ( Balle et al., 2018 ). Furthermore, sampling and sample-based estimation are well understood by statisticians. This should facilitate adoption, especially when other privacy-preserving mechanisms - like output perturbation, where noise is added to the query answer - are viewed unfavourably as 'messing with the data.' However, random sample queries suffer from inconsistency issues, which hitherto hindered their adoption. In this post, I show how the cell key mechanis...