Cleaned NHANES 1988-2018

Name: Cleaned NHANES 1988-2018
Creator: figshare
Published: 2025-06-01 05:40:35
License: 暂无描述

DataCite Commons2025-06-01 更新2024-08-19 收录

下载链接：

https://figshare.com/articles/dataset/NHANES_1988-2018/21743372/7

下载链接

链接失效反馈

官方服务：

资源简介：

The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables). csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary\_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary\_harmonized\_categories.csv" contains the harmonized categories for the categorical variables.“dictionary\_drug\_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes\_inconsistencies\_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules. R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988\_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes\_1988\_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data. Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example\_0 - merge\_datasets\_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example\_1 - account\_for\_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example\_2 - calculate\_summary\_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example\_3 - run\_multiple\_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

美国国家健康与营养检查调查（National Health and Nutrition Examination Survey, NHANES）可提供相关数据，在研究美国非机构化人群的健康状况与环境暴露方面具备巨大应用潜力。然而，NHANES数据集存在多处不一致性问题，因此在通过大规模分析获取新发现前，需先对其进行预处理。为此，本研究整合了NHANES三期（1988-1994年）与连续调查阶段（1999-2018年）的非限制性数据，合并614个独立文件后构建了一套经过整理与统一的数据集，共涵盖135310名参与者与5078个变量。该数据集包含的变量类型如下：人口统计学变量（281个）、膳食摄入变量（324个）、生理功能变量（1040个）、职业相关变量（61个）、问卷类变量（1444个，涵盖身体活动、健康状况、糖尿病、生殖健康、血压与胆固醇、儿童早期健康等主题）、用药相关变量（29个）、关联美国国家死亡索引的死亡信息变量（15个）、调查权重变量（857个）、环境暴露生物标志物检测变量（598个），以及用于标注检测结果是否低于或高于检测下限的化学注释变量（505个）。 CSV数据文件说明：经过整理的NHANES数据集与数据字典共包含23个CSV格式文件与1个Excel文件。其中整理后的NHANES数据集包含20个CSV格式文件，每个模块对应2个文件：1个为未清洗版本，另1个为已清洗版本。各模块标签如下：1）死亡数据模块、2）膳食数据模块、3）人口统计学数据模块、4）应答数据模块、5）用药数据模块、6）问卷数据模块、7）化学物质数据模块、8）职业数据模块、9）权重数据模块、10）注释数据模块。`dictionary_nhanes.csv`为数据字典文件，列出了NHANES全部5078个变量的变量名、变量说明、所属模块、分类、单位、CAS编号、注释用途、化学物质家族、化学物质家族简称、检测次数与可用调查周期。`dictionary_harmonized_categories.csv`包含分类变量的统一分类标准。`dictionary_drug_codes.csv`为药物代码相关描述的数据字典。`nhanes_inconsistencies_documentation.xlsx`为Excel格式的清洗说明文档，记录了所有受影响变量的不一致性问题，用于辅助整理各NHANES数据模块。 R语言数据资源说明：对于使用R语言开展分析的研究人员，仅可下载包含已清洗NHANES模块与数据字典的压缩包，该压缩包内含1个.RData文件与1个.R脚本文件。`w - nhanes_1988_2018.RData`将上述所有数据集存储为R数据对象。本研究还公开了所有用于数据整理的自定义R函数脚本。`m - nhanes_1988_2018.R`演示了如何使用自定义函数（即本研究的数据处理流水线）对原始NHANES数据进行整理。 示例入门代码说明：用于辅助用户开展暴露组分析的入门代码集包含4个R Markdown文件（.Rmd），我们建议按顺序学习其中的教程。`example_0 - merge_datasets_together.Rmd`演示了如何将整理后的NHANES数据集进行合并。`example_1 - account_for_nhanes_design.Rmd`演示了如何构建线性回归模型、调查加权回归模型、Cox比例风险模型以及调查加权Cox比例风险模型。`example_2 - calculate_summary_statistics.Rmd`演示了如何在考虑与不考虑NHANES抽样设计的前提下，针对单个变量与多个变量计算汇总统计量。`example_3 - run_multiple_regressions.Rmd`演示了如何在考虑与不考虑抽样设计调整的前提下运行多个回归模型。

提供机构：

figshare

创建时间：

2024-05-16

搜集汇总

数据集介绍