five

Two-Stage Machine Learning-Based Approach to Predict Points of Departure for Human Noncancer and Developmental/Reproductive Effects

收藏
acs.figshare.com2024-08-19 更新2025-01-15 收录
下载链接:
https://acs.figshare.com/articles/dataset/Two-Stage_Machine_Learning-Based_Approach_to_Predict_Points_of_Departure_for_Human_Noncancer_and_Developmental_Reproductive_Effects/25735511/2
下载链接
链接失效反馈
官方服务:
资源简介:
Chemical points of departure (PODs) for critical health effects are crucial for evaluating and managing human health risks and impacts from exposure. However, PODs are unavailable for most chemicals in commerce due to a lack of in vivo toxicity data. We therefore developed a two-stage machine learning (ML) framework to predict human-equivalent PODs for oral exposure to organic chemicals based on chemical structure. Utilizing ML-based predictions for structural/physical/chemical/toxicological properties from OPERA 2.9 as features (Stage 1), ML models using random forest regression were trained with human-equivalent PODs derived from in vivo data sets for general noncancer effects (n = 1,791) and reproductive/developmental effects (n = 2,228), with robust cross-validation for feature selection and estimating generalization errors (Stage 2). These two-stage models accurately predicted PODs for both effect categories with cross-validation-based root-mean-squared errors less than an order of magnitude. We then applied one or both models to 34,046 chemicals expected to be in the environment, revealing several thousand chemicals of moderate concern and several hundred chemicals of high concern for health effects at estimated median population exposure levels. Further application can expand by orders of magnitude the coverage of organic chemicals that can be evaluated for their human health risks and impacts.

化学出发点(PODs)对于评估和管理人类健康风险及其影响至关重要。然而,由于缺乏体内毒性数据,大多数在商业中使用的化学品均无法获取PODs。鉴于此,本研究开发了一种两阶段机器学习(ML)框架,以预测有机化学品经口暴露的人等效PODs。该框架利用基于机器学习的结构/物理/化学/毒理学性质预测,这些预测来自OPERA 2.9版本,并将其作为特征(第一阶段)。在第二阶段,我们使用随机森林回归模型对人类等效PODs进行了训练,这些PODs来源于体内数据集,用于一般非致癌效应(n = 1,791)和生殖/发育效应(n = 2,228),并通过稳健的交叉验证进行特征选择和估计泛化误差。这两个阶段的模型能够以小于十阶的交叉验证均方根误差准确预测两种效应类别的PODs。随后,我们将这些模型应用于预计将存在于环境中的34,046种化学品,揭示了数千种对健康效应中值人口暴露水平具有中度关注以及数百种高度关注的化学品。进一步的应用将能够以数量级的方式扩大评估人类健康风险及其影响的有机化学品的覆盖范围。
提供机构:
ACS Publications
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作