five

NHANES 1988-2018

收藏
DataCite Commons2025-06-01 更新2024-08-26 收录
下载链接:
https://figshare.com/articles/dataset/NHANES_1988-2018/21743372/3
下载链接
链接失效反馈
官方服务:
资源简介:
The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis of the exposome and secular trends on cancer mortality. <strong>csv Data Record:</strong> The curated NHANES datasets and the data dictionaries includes 13 .csv files and 1 excel file. The curated NHANES datasets involves 10 .csv formatted files, one for each module and labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments. The eleventh file is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 4,740 variables in NHANES ("dictionary_nhanes.csv"). The 12th csv file contains the harmonized categories for the categorical variables ("dictionary_harmonized_categories.csv"). The 13th file contains the dictionary for descriptors on the drugs codes (“dictionary_drug_codes.csv”). The 14th file is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES datasets (“nhanes_inconsistencies_documentation.xlsx”). <strong>R Data Record:</strong> For researchers who want to conduct their analysis in the R programming language, the curated NHANES datasets and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file. We provided an .RData file that contains all the aforementioned datasets as R data objects (“w - nhanes_1988_2018.RData”). Also in this .RData file, we make available all R scripts on customized functions that were written to curate the data. We also provide an .R file that shows how we used the customized functions (i.e. our pipeline) to curate the data (“m - nhanes_1988_2018.R”).

美国国家健康与营养检查调查(National Health and Nutrition Examination Survey, NHANES)提供了美国非机构化人群的健康状况与环境暴露相关数据。这类数据对于解析环境与行为因素如何影响人体健康具有极高研究价值,当前亦被用于解答各类公共卫生问题,例如疾病患病率相关研究。不过,在通过大规模分析挖掘全新认知前,这类数据需先完成预处理工作。 NHANES的数据存储于数百个文件中,存在多处不一致性。修正这类不一致需要开展系统性的交叉核查并投入大量工作,但这是准确且可重复地表征暴露组(exposome)与疾病(例如癌症死亡结局)之间关联的必要前提。 为此,我们整合了NHANES Ⅲ(1988-1994年)与连续性调查(1999-2018年)的614个独立文件,对非限制性数据进行标准化协调,构建了一套经整理与统一化的数据集,并配套提供相关代码。该数据集涵盖134310名参与者与4740个变量,这些变量包含以下类别:1)人口统计学信息;2)膳食摄入情况;3)体格检查结果;4)职业信息;5)问卷条目(例如体力活动、总体健康状况、病史);6)用药情况;7)关联自国家死亡索引(National Death Index)的死亡状态;8)调查权重;9)环境暴露生物标志物检测结果;10)用于标注检测结果低于或高于检测下限的化学注释。 我们还提供了变量列表与说明的数据字典,以帮助研究人员浏览数据集。此外,我们提供R Markdown文件,展示用于计算汇总统计量与运行回归模型的示例代码,助力加快暴露组与癌症死亡长期趋势的高通量分析。 <strong>CSV数据记录:</strong>整理后的NHANES数据集与数据字典包含13个.csv文件与1个Excel文件。其中10个.csv格式文件对应10个数据模块,分别命名如下:1)死亡数据;2)膳食数据;3)人口统计学数据;4)应答数据;5)用药数据;6)问卷数据;7)化学物数据;8)职业数据;9)权重数据;10)注释数据。第11个文件为变量字典"dictionary_nhanes.csv",列明了NHANES全部4740个变量的变量名、说明、所属模块、类别、单位、CAS编号、注释用途、化学物家族、简化化学物家族、检测次数与可用调查周期。第12个.csv文件"dictionary_harmonized_categories.csv"包含分类变量的标准化分类体系。第13个.csv文件"dictionary_drug_codes.csv"为药物代码描述字典。第14个文件为Excel文件"nhanes_inconsistencies_documentation.xlsx",其中包含数据清洗文档,记录了所有受影响变量的不一致之处,以辅助NHANES各数据集的整理工作。 <strong>R数据记录:</strong>对于希望使用R语言开展分析的研究人员,整理后的NHANES数据集与数据字典可通过.zip压缩包下载,其中包含一个.RData文件与一个.R脚本文件。我们提供的.RData文件"w - nhanes_1988_2018.RData"包含前述所有数据集的R数据对象,同时内置了所有用于整理数据集的自定义R函数脚本。此外我们还提供R脚本文件"m - nhanes_1988_2018.R",演示了如何使用自定义函数(即本研究的数据处理流水线)完成数据集整理工作。
提供机构:
figshare
创建时间:
2023-01-09
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
NHANES 1988-2018数据集是一个综合性的健康和环境暴露数据集,包含1988年至2018年间134,310名美国非机构化人群的4,740个变量,覆盖人口统计、饮食、体检、职业等多个方面。数据集提供了多种格式的数据记录,包括.csv和R语言格式,便于研究人员进行大规模分析。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作