A Bias-Accuracy-Privacy Trilemma for Statistical Estimation

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/A_Bias-Accuracy-Privacy_Trilemma_for_Statistical_Estimation/28071708

下载链接

链接失效反馈

官方服务：

资源简介：

Differential privacy (DP) is a rigorous notion of data privacy, used for private statistics. The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. This tradeoff is inherent: we prove that no algorithm can simultaneously have low bias, low error, and low privacy loss for arbitrary distributions. Additionally, we show that under strong notions of DP (i.e., pure or concentrated DP), unbiased mean estimation is impossible, even if we assume that the data is sampled from a Gaussian. On the positive side, we show that unbiased mean estimation is possible under a more permissive notion of differential privacy (approximate DP) if we assume that the distribution is symmetric. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

差分隐私（Differential Privacy, DP）是一种严谨的数据隐私定义，适用于隐私保护统计任务。用于差分隐私均值估计的经典算法流程为：首先将样本截断至有界范围，随后对其经验均值添加噪声。截断操作可控制敏感度，进而调控为实现隐私保护所需添加的噪声方差，但同时也会引入统计偏差。这一权衡是固有存在的：我们证明，不存在能够同时满足低偏差、低误差与低隐私损失的算法，适用于任意分布场景。此外，我们证明，在强差分隐私定义（即纯差分隐私（Pure DP）或集中差分隐私（Concentrated DP））下，即便假设数据采样自高斯分布，无偏均值估计也是不可能实现的。从正向研究结果来看，我们证明，若假设分布为对称分布，则在更为宽松的差分隐私定义（近似差分隐私（Approximate DP））下，无偏均值估计是可行的。本文的补充材料可在线获取，其中包含了用于复现本研究的相关材料的标准化说明。

创建时间：

2024-12-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集