An effective machine learning model for rat acute oral toxicity prediction of emerging chemicals: multi-domain applications and structure-activity relationships

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/An_effective_machine_learning_model_for_rat_acute_oral_toxicity_prediction_of_emerging_chemicals_multi-domain_applications_and_structure-activity_relationships/29712744

下载链接

链接失效反馈

官方服务：

资源简介：

Given the widespread presence of emerging contaminants in the environment, assessing and ensuring their biosafety is urgent. Under the Globally Harmonized System (GHS), the LD50 parameter of acute oral toxicity (AOT) is crucial for chemical safety classification. Animal testing limitations have highlighted the need for alternative methods, and machine learning offers a new approach to predict LD50 through quantitative structure-activity relationship (QSAR) models. This study developed and optimized a machine learning model for LD50 classification of emerging contaminants based on data from more than 6000 known AOT. Using molecular descriptors and fingerprints, the model achieves an accuracy above 0.86 and a recall score over 0.84, outperforming previous models. The model’s robustness was confirmed across various types of emerging contaminants. Shapley additive explanations (SHAP) identified key descriptors like BCUTp_1h, ATSC1pe, and SLogP_VSA4, while the information gain (IG) method highlighted alert substructures [P-O, P-S]. These findings suggest that compounds with high polarizability, mean electronegativity and significant surface area may adversely affect rats. This model enhances understanding of acute toxicity mechanisms and serves as a tool for early screening of safer compounds, promoting the design of greener chemicals.

鉴于环境中新兴污染物（emerging contaminants）分布广泛，评估并保障其生物安全性已迫在眉睫。在全球统一协调制度（Globally Harmonized System, GHS）框架下，急性经口毒性（acute oral toxicity, AOT）的LD50参数对于化学品安全分类至关重要。动物实验的局限性凸显了替代研究方法的必要性，而机器学习则为通过定量构效关系（quantitative structure-activity relationship, QSAR）模型预测LD50提供了全新解决方案。本研究基于6000余组已知急性经口毒性数据，开发并优化了一款面向新兴污染物LD50分类的机器学习模型。该模型借助分子描述符与分子指纹，实现了0.86以上的准确率与0.84以上的召回率，性能优于此前的同类模型。针对多类新兴污染物的验证结果证实了该模型的稳健性。夏普莱加性解释（Shapley additive explanations, SHAP）识别出BCUTp_1h、ATSC1pe与SLogP_VSA4等关键分子描述符，而信息增益（information gain, IG）方法则凸显了[P-O, P-S]这类警示性亚结构。研究结果表明，具备高极化性、平均电负性及显著表面积的化合物可能对大鼠产生不良影响。本模型加深了对急性毒性作用机制的理解，可作为早期筛选更安全化合物的工具，助力绿色化学品的设计与研发。

创建时间：

2025-07-31