Machine Learning Models Based on Enlarged Chemical Spaces for Screening Carcinogenic Chemicals
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Machine_Learning_Models_Based_on_Enlarged_Chemical_Spaces_for_Screening_Carcinogenic_Chemicals/29429775
下载链接
链接失效反馈官方服务:
资源简介:
Machine learning (ML) models for screening carcinogenic
chemicals
are critical for the sound management of chemicals. Previous models
were built on small-scale datasets and lacked applicability domain
(AD) characterization that is necessary for regulatory applications
of the models. In the current study, an enlarged dataset containing
1697 compounds (940 carcinogens and 757 non-carcinogens) was curated
and employed to construct screening models based on 12 types of molecular
fingerprints, four ML algorithms, and two graph neural networks. The
AD of the optimal model was defined by a state-of-the-art characterization
methodology (ADSAL) based on the analysis of structure-activity
landscapes (SALs). Results showed that an optimal model based on the
random forest algorithm with the PubChem fingerprints outperformed
previous ones, with an area under the receiver operating characteristic
curve of 86.2% on the validation set imposed with the ADSAL. The optimal model, coupled with the ADSAL, was employed
to screen carcinogenic chemicals in the Inventory of Existing Chemical
Substances of China (IECSC) and plastic additives datasets, identifying
1282 chemicals from the IECSC and 841 plastic additives as carcinogenic
chemicals. The screening model coupled with ADSAL may serve
as a promising tool for prioritizing chemicals of carcinogenic concern,
facilitating the sound management of chemicals.
创建时间:
2025-06-27



