Data from: Aim high, stay private: Differentially private synthetic data enables public release of behavioral health information with high utility
收藏DataCite Commons2026-04-27 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.zgmsbcct8
下载链接
链接失效反馈官方服务:
资源简介:
Sharing behavioral health and wearable data poses privacy challenges, as
traditional de-identification remains vulnerable to re-identification.
Differential privacy (DP) provides mathematical guarantees through a
tunable privacy budget, ϵ. This study evaluates the feasibility of
generating and releasing DP synthetic behavioral health data with high
analytical utility, identifying practical ϵ values for public data
sharing. We analyzed physiological data from wearable devices and
self-reported data from Phase 1 of the Lived Experiences Measured Using
Rings Study (LEMURS), which tracked sleep, stress, and well-being among
first-year college students. Three DP synthetic data generators: AIM, MST,
and PATECTGAN, were evaluated across privacy budgets ranging from ϵ = 1 to
100. Utility was assessed using L1/L2 errors, correlation, regression,
UMAP, and assessed vulnerability via privacy attacks. Results: AIM
outperformed MST and PATECTGAN in preserving both statistical and
analytical properties of the original data. For the Survey dataset, the
lowest marginal errors occurred at ϵ = 5 and 10. Correlation, regression,
and UMAP analyses confirmed that AIM generated data closely replicated
original relationships at moderate ϵ values. Choice of privacy budget is
still an open question, and it is task-agnostic and dataset-specific.
Moderate privacy budgets (5 ≤ ϵ ≤ 10) maintained key associations between
physiological and psychological measures while ensuring privacy. AIM’s
workload-aware design effectively allocated noise toward relevant
features, enhancing performance. A privacy budget of ϵ = 5 offers a
practical balance between data utility and participant privacy for LEMURS
behavioral health data sharing.
提供机构:
Dryad
创建时间:
2026-04-17



