five

Direct Prediction of Bioaccumulation of Organic Contaminants in Plant Roots from Soils with Machine Learning Models Based on Molecular Structures

收藏
Figshare2021-12-03 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Direct_Prediction_of_Bioaccumulation_of_Organic_Contaminants_in_Plant_Roots_from_Soils_with_Machine_Learning_Models_Based_on_Molecular_Structures/17121912
下载链接
链接失效反馈
官方服务:
资源简介:
Root concentration factor (RCF) is an important characterization parameter to describe accumulation of organic contaminants in plants from soils in life cycle impact assessment (LCIA) and phytoremediation potential assessment. However, building robust predictive models remains challenging due to the complex interactions among chemical–soil–plant root systems. Here we developed end-to-end machine learning models to devolve the complex molecular structure relationship with RCF by training on a unified RCF data set with 341 data points covering 72 chemicals. We demonstrate the efficacy of the proposed gradient boosting regression tree (GBRT) model based on the extended connectivity fingerprints (ECFP) by predicting RCF values and achieved prediction performance with R-squared of 0.77 and mean absolute error (MAE) of 0.22 using 5-fold cross validation. In addition, our results reveal nonlinear relationships among properties of chemical, soil, and plant. Further in-depth analyses identify the key chemical topological substructures (e.g., −O, −Cl, aromatic rings and large conjugated π systems) related to RCF. Stemming from its simplicity and universality, the GBRT-ECFP model provides a valuable tool for LCIA and other environmental assessments to better characterize chemical risks to human health and ecosystems.
创建时间:
2021-12-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作