Direct Prediction of Bioaccumulation of Organic Contaminants in Plant Roots from Soils with Machine Learning Models Based on Molecular Structures
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/Direct_Prediction_of_Bioaccumulation_of_Organic_Contaminants_in_Plant_Roots_from_Soils_with_Machine_Learning_Models_Based_on_Molecular_Structures/17121912
下载链接
链接失效反馈官方服务:
资源简介:
Root concentration factor (RCF) is
an important characterization
parameter to describe accumulation of organic contaminants in plants
from soils in life cycle impact assessment (LCIA) and phytoremediation
potential assessment. However, building robust predictive models remains
challenging due to the complex interactions among chemical–soil–plant
root systems. Here we developed end-to-end machine learning models
to devolve the complex molecular structure relationship with RCF by
training on a unified RCF data set with 341 data points covering 72
chemicals. We demonstrate the efficacy of the proposed gradient boosting
regression tree (GBRT) model based on the extended connectivity fingerprints
(ECFP) by predicting RCF values and achieved prediction performance
with R-squared of 0.77 and mean absolute error (MAE) of 0.22 using
5-fold cross validation. In addition, our results reveal nonlinear
relationships among properties of chemical, soil, and plant. Further
in-depth analyses identify the key chemical topological substructures
(e.g., −O, −Cl, aromatic rings and large conjugated
π systems) related to RCF. Stemming from its simplicity and
universality, the GBRT-ECFP model provides a valuable tool for LCIA
and other environmental assessments to better characterize chemical
risks to human health and ecosystems.
创建时间:
2021-12-03



