Synthetic Data in Communication Sciences and Disorders: Promoting an Open, Reproducible, and Cumulative Science

Name: Synthetic Data in Communication Sciences and Disorders: Promoting an Open, Reproducible, and Cumulative Science
Creator: osf.io
Published: 2024-10-20 00:00:00
License: 暂无描述

osf.io2024-10-20 更新2025-03-26 收录

下载链接：

https://osf.io/yhkqf

下载链接

链接失效反馈

官方服务：

资源简介：

Reproducibility is a core principle of science and access to a study’s data is essential to reproduce its findings. However, data sharing is uncommon in the field of Communication Sciences and Disorders (CSD), often due to concerns related to privacy and disclosure risks. Synthetic data offers a potential solution to this barrier by generating artificial datasets that do not represent real individuals yet retain statistical properties and relationships from the original data. This study evaluates the performance of synthetic data generation using open data from previously published studies across the American Speech-Language-Hearing Association (ASHA) ‘Big Nine’ domains. Findings suggest that synthetic data can effectively maintain statistical properties and relationships across a wide range of data commonly seen in the field of CSD. While some studies with fewer observations than recommended (i.e., n<130) showed lower agreement and greater variability in p-values and effect size estimates, this was not consistently appreciated. Therefore, researchers who use synthetic data should assess its stability in preserving their results. This study concludes with a general framework on sharing open data to facilitate computational reproducibility and foster a cumulative science in the field of CSD.

可复现性是科学的核心原则，而对研究数据的访问对于验证其发现至关重要。然而，在交流科学和障碍（CSD）领域，数据共享并不常见，这通常是由于与隐私和披露风险相关的担忧。合成数据通过生成不代表真实个体的合成数据集，同时保留原始数据的统计特性和关系，为克服这一障碍提供了一种潜在解决方案。本研究评估了使用来自美国言语-语言-听力协会（ASHA）“大九”领域先前发表研究的开放数据生成合成数据的性能。研究发现，合成数据可以有效地维持CSD领域广泛存在的数据的统计特性和关系。尽管一些研究在观察数量少于建议值（即 n<130）的情况下显示出较低的吻合度和更大的p值和效应量估计的变异性，但这并未得到一致的认可。因此，使用合成数据的学者应评估其在保留其结果方面的稳定性。本研究以一个关于共享开放数据的一般框架作为结论，旨在促进计算可复现性并培育CSD领域的累积科学。

提供机构：

osf.io

5,000+

优质数据集

54 个

任务类型

进入经典数据集