Two-Stage Machine Learning-Based Approach to Predict Points of Departure for Human Noncancer and Developmental/Reproductive Effects
收藏acs.figshare.com2024-08-19 更新2025-01-15 收录
下载链接:
https://acs.figshare.com/articles/dataset/Two-Stage_Machine_Learning-Based_Approach_to_Predict_Points_of_Departure_for_Human_Noncancer_and_Developmental_Reproductive_Effects/25735511/2
下载链接
链接失效反馈官方服务:
资源简介:
Chemical points of
departure (PODs) for critical health effects
are crucial for evaluating and managing human health risks and impacts
from exposure. However, PODs are unavailable for most chemicals in
commerce due to a lack of in vivo toxicity data.
We therefore developed a two-stage machine learning (ML) framework
to predict human-equivalent PODs for oral exposure to organic chemicals
based on chemical structure. Utilizing ML-based predictions for structural/physical/chemical/toxicological
properties from OPERA 2.9 as features (Stage 1), ML models using random
forest regression were trained with human-equivalent PODs derived
from in vivo data sets for general noncancer effects
(n = 1,791) and reproductive/developmental effects
(n = 2,228), with robust cross-validation for feature
selection and estimating generalization errors (Stage 2). These two-stage
models accurately predicted PODs for both effect categories with cross-validation-based
root-mean-squared errors less than an order of magnitude. We then
applied one or both models to 34,046 chemicals expected to be in the
environment, revealing several thousand chemicals of moderate concern and several hundred chemicals of high concern
for health effects at estimated median population exposure levels.
Further application can expand by orders of magnitude the coverage
of organic chemicals that can be evaluated for their human health
risks and impacts.
化学出发点(PODs)对于评估和管理人类健康风险及其影响至关重要。然而,由于缺乏体内毒性数据,大多数在商业中使用的化学品均无法获取PODs。鉴于此,本研究开发了一种两阶段机器学习(ML)框架,以预测有机化学品经口暴露的人等效PODs。该框架利用基于机器学习的结构/物理/化学/毒理学性质预测,这些预测来自OPERA 2.9版本,并将其作为特征(第一阶段)。在第二阶段,我们使用随机森林回归模型对人类等效PODs进行了训练,这些PODs来源于体内数据集,用于一般非致癌效应(n = 1,791)和生殖/发育效应(n = 2,228),并通过稳健的交叉验证进行特征选择和估计泛化误差。这两个阶段的模型能够以小于十阶的交叉验证均方根误差准确预测两种效应类别的PODs。随后,我们将这些模型应用于预计将存在于环境中的34,046种化学品,揭示了数千种对健康效应中值人口暴露水平具有中度关注以及数百种高度关注的化学品。进一步的应用将能够以数量级的方式扩大评估人类健康风险及其影响的有机化学品的覆盖范围。
提供机构:
ACS Publications



