Machine Learning-Based Models with High Accuracy and Broad Applicability Domains for Screening PMT/vPvM Substances
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://figshare.com/articles/dataset/Machine_Learning-Based_Models_with_High_Accuracy_and_Broad_Applicability_Domains_for_Screening_PMT_vPvM_Substances/21687447
下载链接
链接失效反馈官方服务:
资源简介:
Persistent, mobile, and toxic (PMT)
substances and very persistent
and very mobile (vPvM) substances can transport over long distances
from various sources, increasing the public health risk. A rapid and
high-throughput screening of PMT/vPvM substances is thus warranted
to the risk prevention and mitigation measures. Herein, we construct
a machine learning-based screening system integrated with five models
for high-throughput classification of PMT/vPvM substances. The models
are constructed with 44 971 substances by conventional learning,
deep learning, and ensemble learning algorithms, among which, LightGBM
and XGBoost outperform other algorithms with metrics exceeding 0.900.
Good model interpretability is achieved through the number of free
halogen atoms (fr_halogen) and the logarithm of partition coefficient
(MolLogP) as the two most critical molecular descriptors representing
the persistence and mobility of substances, respectively. Our screening
system exhibits a great generalization capability with area under
the receiver operating characteristic curve (AUROC) above 0.951 and
is successfully applied to the persistent organic pollutants (POPs),
prioritized PMT/vPvM substances, and pesticides. The screening system
constructed in this study can serve as an efficient and reliable tool
for high-throughput risk assessment and the prioritization of managing
emerging contaminants.
创建时间:
2022-12-07



