ChemBioSim: Enhancing Conformal Prediction of In Vivo Toxicity by Use of Predicted Bioactivities
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/ChemBioSim_Enhancing_Conformal_Prediction_of_In_Vivo_Toxicity_by_Use_of_Predicted_Bioactivities/14818522
下载链接
链接失效反馈官方服务:
资源简介:
Computational methods such as machine
learning approaches have
a strong track record of success in predicting the outcomes of in
vitro assays. In contrast, their ability to predict in vivo endpoints
is more limited due to the high number of parameters and processes
that may influence the outcome. Recent studies have shown that the
combination of chemical and biological data can yield better models
for in vivo endpoints. The ChemBioSim approach presented in this work
aims to enhance the performance of conformal prediction models for
in vivo endpoints by combining chemical information with (predicted)
bioactivity assay outcomes. Three in vivo toxicological endpoints,
capturing genotoxic (MNT), hepatic (DILI), and cardiological (DICC)
issues, were selected for this study due to their high relevance for
the registration and authorization of new compounds. Since the sparsity
of available biological assay data is challenging for predictive modeling,
predicted bioactivity descriptors were introduced instead. Thus, a
machine learning model for each of the 373 collected biological assays
was trained and applied on the compounds of the in vivo toxicity data
sets. Besides the chemical descriptors (molecular fingerprints and
physicochemical properties), these predicted bioactivities served
as descriptors for the models of the three in vivo endpoints. For
this study, a workflow based on a conformal prediction framework (a
method for confidence estimation) built on random forest models was
developed. Furthermore, the most relevant chemical and bioactivity
descriptors for each in vivo endpoint were preselected with lasso
models. The incorporation of bioactivity descriptors increased the
mean F1 scores of the MNT model from 0.61 to 0.70 and for the DICC
model from 0.72 to 0.82 while the mean efficiencies increased by roughly
0.10 for both endpoints. In contrast, for the DILI endpoint, no significant
improvement in model performance was observed. Besides pure performance
improvements, an analysis of the most important bioactivity features
allowed detection of novel and less intuitive relationships between
the predicted biological assay outcomes used as descriptors and the
in vivo endpoints. This study presents how the prediction of in vivo
toxicity endpoints can be improved by the incorporation of biological
informationwhich is not necessarily captured by chemical descriptorsin
an automated workflow without the need for adding experimental workload
for the generation of bioactivity descriptors as predicted outcomes
of bioactivity assays were utilized. All bioactivity CP models for
deriving the predicted bioactivities, as well as the in vivo toxicity
CP models, can be freely downloaded from https://doi.org/10.5281/zenodo.4761225.
创建时间:
2021-06-21



