Into the research multiverse: How decisions about data can affect the results of biomedical studies that integrate community-level data
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/95954hvxvj
下载链接
链接失效反馈官方服务:
资源简介:
Integrating electronic health record data with environmental data has the potential to enrich biomedical research with new insights into the relationship between health and environment. However, the data preparation process carries implications that have not been fully explored. The objectives of this study were to (a) determine whether and how different data preparation decisions in the same integrated dataset affected the results of the analyses and (b) identify which decisions introduced the most variability.
For this study, we repurposed a dataset from a prior study that examined the association between poor air quality days caused by wildfire smoke and pulmonary exacerbations in people with cystic fibrosis. The clinical dataset was created by querying the Cystic Fibrosis Foundation Patient Registry and pulling the data of patients treated at Oregon Health & Science University’s Cystic Fibrosis Care Center and Doernbecher Children’s Hospital from 2010 to 2019 (inclusive). Community-level data about fine particulate matter (PM2.5) was obtained from the EPA’s Air Quality System DataMart. We developed an algorithm that ran the same dataset through a variety of plausible decisions in preparing the data and generated the same statistical output for each analysis. We compared point estimate odds ratios, confidence intervals, and p-values and evaluated how data preparation approaches affected the characteristics of resulting patient cohorts. A total of 135 data preparation pathways generated 93 unique odds ratios, of which 26 appeared more than once in the results. The resulting odds ratios ranged from 0.83 to 2.93, with a mean of 1.31 (SD ±0.37). More than half (50.37%) of the results had a p-value ≤0.05. Different data preparation decisions removed up to 87.23% of patients and 93.51% of patient days. The percentage of patient days contributed by patients living in urban areas varied between 67.54% and 98.73%.
创建时间:
2024-10-01



