Interpretable Machine Learning to Understand Wildfire Toxicity: Bridging Chemicals, Omics, and Toxicological Outcomes via Symbolic Regression with Novel Feature Scoring
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Interpretable_Machine_Learning_to_Understand_Wildfire_Toxicity_Bridging_Chemicals_Omics_and_Toxicological_Outcomes_via_Symbolic_Regression_with_Novel_Feature_Scoring/31932478
下载链接
链接失效反馈官方服务:
资源简介:
Wildfire smoke exposures are increasingly common, consisting
of
complex mixtures of gases and particulates known to cause diverse
pulmonary health effects. While health outcomes are regularly studied,
quantitative links between smoke chemical composition and toxicological
outcomes remain poorly defined, limiting interpretation of wildfire
smoke health risks. This study explores symbolic regression (SR) as
an interpretable artificial intelligence/machine learning method to
generate closed-form mathematical models linking chemical exposure
to biological responses relevant to wildfire smoke. Prior to application
on wildfire-relevant data sets, we benchmarked three Python-based
SR packages on simulated data, assessing performance across varying
noise levels and operator complexities. Insights from these simulation
tests, such as the importance of including necessary operators, were
incorporated when applying SR to lab-generated wildland fire exposure-toxicity
data. This data set included chemical characterizations of biomass
smoke exposures and corresponding pulmonary responses in female CD-1
mice (n = 60). Specifically, we evaluated the ability
to predict a lung injury marker using (1) targeted measures of over
80 chemicals measured in smoke (RMSE = 17.57 mg/mL) and (2) lung tissue
measures of hundreds of transcripts (RMSE = 15.12 mg/mL). Resulting
error metrics were comparable to Random Forest and XGBoost models.
To aid model interpretation, we developed directional ensemble contribution
scores (DECS), a novel feature importance scoring method that quantifies
the direction and magnitude of predictor contributions across top-performing
models. Expert toxicologists also contributed to model prioritization,
integrating a “biologists-in-the-loop” approach. Results
highlighted polycyclic aromatic hydrocarbons as drivers of lung injury
and methoxyphenols as suppressors. Transcriptomic analyses highlighted
a small set of genes, which have roles in metabolism, cell proliferation,
immune regulation, and oncogenic processes, with MYC proto-oncogene
(Myc) showing the strongest association. Overall,
this study demonstrates SR and associated DECS as practical, interpretable
tools for modeling environmental mixtures, such as wildfire smoke,
and their toxicological effects.
创建时间:
2026-04-03



