Association Between Air Pollution and Child Brain Development: A Machine Learning Based Environmental Modeling Study
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/jyrkr8cdy9
下载链接
链接失效反馈官方服务:
资源简介:
This study examines the relationship between air pollution exposure and child brain development using machine learning–based environmental modeling. Air pollution is increasingly recognized as a determinant of neurodevelopmental outcomes, particularly during early life when the brain undergoes rapid structural and functional changes. The project integrates simulated multi-source pollution data with advanced analytical techniques to characterize complex, nonlinear exposure–response relationships.
A synthetic dataset of 100 exposure scenarios was developed to represent realistic variations in pollutant levels originating from industrial operations, vehicular traffic, dust storms, and wildfires. Pollutants included NOx, CO, CO₂, dust PM₁₀, and wildfire PM₂.₅. Emission levels were generated within typical environmental ranges. A Total Exposure Index was calculated as a weighted sum of pollutants to represent cumulative exposure intensity. The Predicted Impact (%) variable was simulated using a nonlinear function incorporating pollutant interactions and controlled random noise, ensuring reproducibility through fixed random seeds.
Each dataset entry was labeled with one of several machine learning techniques (Random Forest, XGBoost, SVM, Neural Networks, Gradient Boosting, Linear Regression) to illustrate methodological diversity. Analyses were performed using Python and standard open-source libraries in a reproducible Jupyter Notebook environment. Exploratory analysis included pairwise scatter plots, correlation heatmaps, and boxplots to visualize relationships and compare model behaviors.
A Random Forest model with cross-validation was applied to assess predictive performance and feature importance. Results indicated that the Total Exposure Index was the dominant predictor, followed by particulate matter and CO. Correlation analysis revealed a strong positive relationship between total exposure and predicted impact, emphasizing the significance of overall pollutant burden in shaping modeled neurodevelopmental outcomes.
Although synthetic, this dataset demonstrates the utility of machine learning approaches in environmental health research. It provides a structured, transparent, and reproducible framework that can be adapted to real-world monitoring data. The methodology supports method development, sensitivity analysis, and policy-relevant modeling, aligning with current trends in data-driven environmental epidemiology. By highlighting key exposure drivers and interactions, the study contributes to a growing evidence base informing public health protection and regulatory decision-making in the context of vulnerable populations, particularly children. The approach can also be extended to diverse geographical regions or pollutant mixtures. This enhances its value as a flexible educational and research resource.
创建时间:
2025-10-21



