five

Machine Learning Models Based on Enlarged Chemical Spaces for Screening Carcinogenic Chemicals

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Machine_Learning_Models_Based_on_Enlarged_Chemical_Spaces_for_Screening_Carcinogenic_Chemicals/29429775
下载链接
链接失效反馈
官方服务:
资源简介:
Machine learning (ML) models for screening carcinogenic chemicals are critical for the sound management of chemicals. Previous models were built on small-scale datasets and lacked applicability domain (AD) characterization that is necessary for regulatory applications of the models. In the current study, an enlarged dataset containing 1697 compounds (940 carcinogens and 757 non-carcinogens) was curated and employed to construct screening models based on 12 types of molecular fingerprints, four ML algorithms, and two graph neural networks. The AD of the optimal model was defined by a state-of-the-art characterization methodology (ADSAL) based on the analysis of structure-activity landscapes (SALs). Results showed that an optimal model based on the random forest algorithm with the PubChem fingerprints outperformed previous ones, with an area under the receiver operating characteristic curve of 86.2% on the validation set imposed with the ADSAL. The optimal model, coupled with the ADSAL, was employed to screen carcinogenic chemicals in the Inventory of Existing Chemical Substances of China (IECSC) and plastic additives datasets, identifying 1282 chemicals from the IECSC and 841 plastic additives as carcinogenic chemicals. The screening model coupled with ADSAL may serve as a promising tool for prioritizing chemicals of carcinogenic concern, facilitating the sound management of chemicals.
创建时间:
2025-06-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作