Replication Data for: Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://doi.org/10.7910/DVN/UDFZJD
下载链接
链接失效反馈官方服务:
资源简介:
We offer methods to analyze the “differentially private” Facebook URLs Dataset which, at over 17 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it pos- sible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias — includ- ing attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statisti- cally valid linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of non-confidential data but with appropriately larger standard errors.
创建时间:
2021-09-04



