five

Cleaned NHANES 1988-2018

收藏
DataCite Commons2025-06-01 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/NHANES_1988-2018/21743372/8
下载链接
链接失效反馈
官方服务:
资源简介:
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).<br><b>csv Data Record: </b>The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary\_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary\_harmonized\_categories.csv" contains the harmonized categories for the categorical variables.“dictionary\_drug\_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes\_inconsistencies\_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.<br><b>R Data Record:</b> For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988\_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes\_1988\_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.<br><b>Example starter codes:</b> The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example\_0 - merge\_datasets\_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example\_1 - account\_for\_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example\_2 - calculate\_summary\_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example\_3 - run\_multiple\_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

美国国家健康与营养检查调查(National Health and Nutrition Examination Survey, NHANES)提供了相关数据,在研究非制度化美国人群的健康状况与环境暴露方面具备巨大潜力。然而,NHANES数据集存在多处不一致性问题,因此在通过大规模分析获取新认知前,需先对这些数据进行预处理。为此,我们整合了614个独立文件,对NHANES Ⅲ(1988-1994年)与连续调查阶段(1999-2018年)的无限制数据进行统一协调,构建了一套经过精选与标准化的数据集,涵盖135310名参与者与5078个变量。这些变量涵盖人口统计学(281个变量)、膳食摄入(324个变量)、生理功能(1040个变量)、职业信息(61个变量)、问卷调研(1444个变量,例如身体活动、疾病状况、糖尿病、生殖健康、血压与胆固醇、儿童早期发育相关内容)、用药情况(29个变量)、来自国家死亡索引的死亡关联信息(15个变量)、调查权重(857个变量)、环境暴露生物标志物检测数据(598个变量),以及用于标注检测结果低于或高于检测下限的化学物注释信息(505个变量)。<br><b>CSV数据记录:</b>本套精选后的NHANES数据集与数据字典包含23个CSV文件与1个Excel文件。其中精选NHANES数据集包含20个CSV格式文件,每个模块对应2个文件,分别为未清洗版本与清洗后版本。模块分类如下:1)死亡相关数据、2)膳食摄入、3)人口统计学信息、4)应答信息、5)用药情况、6)问卷调研、7)化学物相关数据、8)职业信息、9)调查权重、10)注释信息。"dictionary_nhanes.csv"为变量字典文件,列出了NHANES全部5078个变量的变量名、描述信息、所属模块、分类、单位、CAS编号、注释用途、化学物家族、简化化学物家族名称、检测次数与可用调查周期。"dictionary_harmonized_categories.csv"包含分类变量的统一协调分类标准。"dictionary_drug_codes.csv"收录了药物代码相关描述的字典文件。"nhanes_inconsistencies_documentation.xlsx"为Excel文件,包含数据清洗文档,记录了所有受影响变量的不一致性问题,用于辅助各NHANES模块的数据精选工作。<br><b>R数据记录:</b>对于希望使用R语言开展分析的研究人员,可下载仅包含清洗后NHANES模块与数据字典的压缩包,其中包含一个.RData文件与一个.R脚本文件。"w - nhanes_1988_2018.RData"以R数据对象形式存储了前述全部数据集。我们还提供了所有用于数据精选的自定义R函数脚本。"m - nhanes_1988_2018.R"展示了如何使用自定义函数(即我们的数据处理流程)对原始NHANES数据进行精选。<br><b>示例启动代码:</b>本套用于帮助用户开展暴露组分析的启动代码包含4个R Markdown文件(.Rmd),我们建议按顺序完成教程学习。"example_0 - merge_datasets_together.Rmd"演示了如何将精选后的NHANES数据集进行合并。"example_1 - account_for_nhanes_design.Rmd"演示了如何构建线性回归模型、调查加权回归模型、Cox比例风险模型与调查加权Cox比例风险模型。"example_2 - calculate_summary_statistics.Rmd"演示了如何针对单个变量与多个变量计算汇总统计量,同时考虑与不考虑NHANES抽样设计的影响。"example_3 - run_multiple_regressions.Rmd"演示了如何构建考虑与不考虑抽样设计调整的多回归模型。
提供机构:
figshare
创建时间:
2025-02-18
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作