What is the ecotoxicity of a given chemical for a given aquatic species? Predicting interactions between species and chemicals using recommender system techniques
收藏DataCite Commons2023-11-03 更新2024-09-03 收录
下载链接:
https://tandf.figshare.com/articles/dataset/What_is_the_ecotoxicity_of_a_given_chemical_for_a_given_aquatic_species_Predicting_interactions_between_species_and_chemicals_using_recommender_system_techniques/24091888/1
下载链接
链接失效反馈官方服务:
资源简介:
Ecotoxicological safety assessment of chemicals requires toxicity data on multiple species, despite the general desire of minimizing animal testing. Predictive models, specifically machine learning (ML) methods, are one of the tools capable of solving this apparent contradiction as they allow to generalize toxicity patterns across chemicals and species. However, despite the availability of large public toxicity datasets, the data is highly sparse, complicating model development. The aim of this study is to provide insights into how ML can predict toxicity using a large but sparse dataset. We developed models to predict LC50-values, based on experimental LC50-data covering 2431 organic chemicals and 1506 aquatic species from the ECOTOX-database. Several well-known ML techniques were evaluated and a new ML model was developed, inspired by recommender systems. This new model involves a simple linear model that learns low-rank interactions between species and chemicals using factorization machines. We evaluated the predictive performances of the developed models based on two validation settings: 1) predicting unseen chemical-species pairs, and 2) predicting unseen chemicals. The results of this study show that ML models can accurately predict LC50-values in both validation settings. Moreover, we show that the novel factorization machine approach can match well-tuned, complex, ML approaches.
化学品的生态毒理学安全评估需获取多物种毒性数据,尽管学界普遍期望尽可能减少动物实验。预测模型,尤其是机器学习(Machine Learning, ML)方法,是破解这一矛盾的有效工具之一——它们能够实现化学品与物种间毒性模式的泛化迁移。然而,尽管已有大规模公开毒性数据集,但数据存在严重稀疏性,这为模型开发带来了极大挑战。本研究旨在探索如何利用大规模稀疏数据集,通过机器学习方法实现毒性预测。本研究基于ECOTOX数据库中覆盖2431种有机化学品与1506种水生生物的实验半致死浓度(LC50)数据,构建了用于预测LC50值的模型。研究团队对多种经典机器学习技术进行了评估,并受推荐系统启发,开发了一款全新的机器学习模型。该新型模型采用简单线性架构,通过因子分解机(Factorization Machines)学习物种与化学品间的低秩交互关系。研究基于两类验证场景对所构建模型的预测性能进行了评估:其一为预测未见过的化学品-物种配对数据,其二为预测未出现过的全新化学品。本研究结果表明,机器学习模型在两类验证场景下均能实现LC50值的精准预测。此外,研究证实,该新型因子分解机方法的性能可媲美经过精细调优的复杂机器学习模型。
提供机构:
Taylor & Francis
创建时间:
2023-09-06



