five

Interpretable Machine Learning to Understand Wildfire Toxicity: Bridging Chemicals, Omics, and Toxicological Outcomes via Symbolic Regression with Novel Feature Scoring

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Interpretable_Machine_Learning_to_Understand_Wildfire_Toxicity_Bridging_Chemicals_Omics_and_Toxicological_Outcomes_via_Symbolic_Regression_with_Novel_Feature_Scoring/31932478
下载链接
链接失效反馈
官方服务:
资源简介:
Wildfire smoke exposures are increasingly common, consisting of complex mixtures of gases and particulates known to cause diverse pulmonary health effects. While health outcomes are regularly studied, quantitative links between smoke chemical composition and toxicological outcomes remain poorly defined, limiting interpretation of wildfire smoke health risks. This study explores symbolic regression (SR) as an interpretable artificial intelligence/machine learning method to generate closed-form mathematical models linking chemical exposure to biological responses relevant to wildfire smoke. Prior to application on wildfire-relevant data sets, we benchmarked three Python-based SR packages on simulated data, assessing performance across varying noise levels and operator complexities. Insights from these simulation tests, such as the importance of including necessary operators, were incorporated when applying SR to lab-generated wildland fire exposure-toxicity data. This data set included chemical characterizations of biomass smoke exposures and corresponding pulmonary responses in female CD-1 mice (n = 60). Specifically, we evaluated the ability to predict a lung injury marker using (1) targeted measures of over 80 chemicals measured in smoke (RMSE = 17.57 mg/mL) and (2) lung tissue measures of hundreds of transcripts (RMSE = 15.12 mg/mL). Resulting error metrics were comparable to Random Forest and XGBoost models. To aid model interpretation, we developed directional ensemble contribution scores (DECS), a novel feature importance scoring method that quantifies the direction and magnitude of predictor contributions across top-performing models. Expert toxicologists also contributed to model prioritization, integrating a “biologists-in-the-loop” approach. Results highlighted polycyclic aromatic hydrocarbons as drivers of lung injury and methoxyphenols as suppressors. Transcriptomic analyses highlighted a small set of genes, which have roles in metabolism, cell proliferation, immune regulation, and oncogenic processes, with MYC proto-oncogene (Myc) showing the strongest association. Overall, this study demonstrates SR and associated DECS as practical, interpretable tools for modeling environmental mixtures, such as wildfire smoke, and their toxicological effects.
创建时间:
2026-04-03
二维码
社区交流群
二维码
科研交流群
商业服务