Enhancing Toxicity Prediction of Synthetic Chemicals via Novel SMILES Fragmentation and Interpretable Deep Learning

Figshare2025-08-26 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Enhancing_Toxicity_Prediction_of_Synthetic_Chemicals_via_Novel_SMILES_Fragmentation_and_Interpretable_Deep_Learning/29988190

下载链接

链接失效反馈

官方服务：

资源简介：

Toxicity prediction and identification of structural alerts (SAs) for synthetic chemicals are critical for assessing risks to environmental and human health. Traditional methods, which rely heavily on molecular descriptors, often suffer from poor interpretability. Here, we introduce a novel framework that integrates SMILES fragmentation strategies with a 1D convolutional neural network deep learning model (denoted as the SFDL) for predicting chemical toxicity and associated SAs. Four distinct fragmentation methods, single-atom, single-symbol, atom-centered, and symbol-centered, were evaluated to generate tokenizers (denoted as GenTok) from 581537 high-interest PubChem compounds. The symbol-centered fragmentation approach demonstrated superior performance on the ISSSTY AMES mutagenicity data set (AUC = 0.87, PRAUC = 0.90). This SFDL-GenTok strategy demonstrated robust predictive performance across 6 out of the 10 toxicity end points (AUC = 0.81∼0.93, PRAUC = 0.70∼0.94). Based on these models, toxicity predictions were conducted for 28160 synthetic chemicals. Potential toxic compounds were subsequently categorized into three groups: endocrine disruption, mutagenicity, and mitochondrial toxicity. SAs analysis revealed that halogenated fragments, nitro or phenolic groups, and reactive electrophilic motifs are critical contributors to endocrine disruption, mitochondrial toxicity, and mutagenicity. This study provides an interpretable tool for toxicity and SAs identification of synthetic chemicals.

创建时间：

2025-08-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集