Attribute occurrence across Data Sources.

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Attribute_occurrence_across_Data_Sources_/30544593

下载链接

链接失效反馈

官方服务：

资源简介：

Data for healthcare applications are typically customized for specific purposes but are often difficult to access due to high costs and privacy concerns. Rather than prepare separate datasets for individual applications, we propose a novel approach: building a general-purpose generative model applicable to virtually any type of healthcare application. This generative model encompasses a broad range of human attributes, including age, sex, anthropometric measurements, blood components, physical performance metrics, and numerous healthcare-related questionnaire responses. To achieve this goal, we integrated the results of multiple clinical studies into a unified training dataset and developed a generative model to replicate its characteristics. The model can estimate missing attribute values from known attribute values and generate synthetic datasets for various applications. Our analysis confirmed that the model captures key statistical properties of the training dataset, including univariate distributions and bivariate relationships. We demonstrate the model’s practical utility through multiple real-world applications, illustrating its potential impact on predictive, preventive, and personalized medicine.

面向医疗应用的数据集通常会针对特定用途进行定制，但由于成本高昂与隐私顾虑，往往难以获取。相较于为各类应用单独制备数据集，我们提出了一种全新方案：构建可适用于几乎所有医疗应用场景的通用生成模型（Generative Model）。该生成模型涵盖了广泛的人类属性，包括年龄、性别、人体测量指标、血液成分、身体机能指标，以及大量医疗相关问卷应答结果。为实现该目标，我们将多项临床研究的结果整合为统一的训练数据集，并开发了一款生成模型以复现其数据特征。该模型可基于已知属性值估算缺失的属性值，并为各类应用生成合成数据集（Synthetic Dataset）。我们的分析证实，该模型能够捕捉训练数据集的关键统计特征，包括单变量分布与双变量关联关系。我们通过多项真实世界应用验证了该模型的实用价值，展现了其对预测医学、预防医学与个性化医疗的潜在推动作用。

创建时间：

2025-11-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集