five

Enhancing Toxicity Prediction of Synthetic Chemicals via Novel SMILES Fragmentation and Interpretable Deep Learning

收藏
Figshare2025-08-26 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Enhancing_Toxicity_Prediction_of_Synthetic_Chemicals_via_Novel_SMILES_Fragmentation_and_Interpretable_Deep_Learning/29988190
下载链接
链接失效反馈
官方服务:
资源简介:
Toxicity prediction and identification of structural alerts (SAs) for synthetic chemicals are critical for assessing risks to environmental and human health. Traditional methods, which rely heavily on molecular descriptors, often suffer from poor interpretability. Here, we introduce a novel framework that integrates SMILES fragmentation strategies with a 1D convolutional neural network deep learning model (denoted as the SFDL) for predicting chemical toxicity and associated SAs. Four distinct fragmentation methods, single-atom, single-symbol, atom-centered, and symbol-centered, were evaluated to generate tokenizers (denoted as GenTok) from 581537 high-interest PubChem compounds. The symbol-centered fragmentation approach demonstrated superior performance on the ISSSTY AMES mutagenicity data set (AUC = 0.87, PRAUC = 0.90). This SFDL-GenTok strategy demonstrated robust predictive performance across 6 out of the 10 toxicity end points (AUC = 0.81∼0.93, PRAUC = 0.70∼0.94). Based on these models, toxicity predictions were conducted for 28160 synthetic chemicals. Potential toxic compounds were subsequently categorized into three groups: endocrine disruption, mutagenicity, and mitochondrial toxicity. SAs analysis revealed that halogenated fragments, nitro or phenolic groups, and reactive electrophilic motifs are critical contributors to endocrine disruption, mitochondrial toxicity, and mutagenicity. This study provides an interpretable tool for toxicity and SAs identification of synthetic chemicals.
创建时间:
2025-08-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作