Predicting Chemical-Induced Liver Toxicity Using High-Content Imaging Phenotypes and Chemical Descriptors: A Random Forest Approach
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Predicting_Chemical-Induced_Liver_Toxicity_Using_High-Content_Imaging_Phenotypes_and_Chemical_Descriptors_A_Random_Forest_Approach/12937633
下载链接
链接失效反馈官方服务:
资源简介:
Hepatotoxicity
is a major reason for the withdrawal or discontinuation
of drugs from clinical trials. Thus, better tools are needed to filter
potential hepatotoxic drugs early in drug discovery. Our study demonstrates
utilization of HCI phenotypes, chemical descriptors, and both combined
(hybrid) descriptors to construct random forest classifiers (RFCs)
for the prediction of hepatotoxicity. HCI data published by Broad
Institute provided HCI phenotypes for about 30 000 samples
in multiple replicates. Phenotypes belonging to 346 chemicals, which
were tested in up to eight replicates, were chosen as a basis for
our analysis. We then constructed individual RFC models for HCI phenotypes,
chemical descriptors, and hybrid (chemical and HCI) descriptors. The
model that was constructed using selective hybrid descriptors showed
high predictive performance with 5-fold cross validation (CV) balanced
accuracy (BA) at 0.71, whereas within the given applicability domain
(AD), independent test set and external test set prediction BAs were
equal to 0.61 and 0.60, respectively. The model constructed using
chemical descriptors showed a similar predictive performance with
a 5-fold CV BA equal to 0.66, a test set prediction BA within the
AD equal to 0.56, and an external test set prediction BA within the
AD equal to 0.50. In conclusion, the hybrid and chemical descriptor-based
models presented here should be considered as a new tool for filtering
hepatotoxic molecules during compound prioritization in drug discovery.
肝毒性(Hepatotoxicity)是药物从临床试验中撤市或终止研发的主要原因。因此,亟需更高效的工具在药物发现早期阶段筛选潜在肝毒性药物。本研究利用HCI表型(HCI phenotypes)、化学描述符(chemical descriptors)以及二者组合的混合描述符(hybrid descriptors),构建随机森林分类器(random forest classifiers,RFCs)以预测肝毒性。布罗德研究所(Broad Institute)发布的HCI数据集,为约30000份多重复实验样本提供了HCI表型数据。我们选取了346种受试化合物的表型数据作为分析基础,这些化合物的实验重复次数最多可达8次。随后,我们分别基于HCI表型、化学描述符以及混合(化学+HCI)描述符构建了独立的RFC模型。采用选择性混合描述符构建的模型展现出优异的预测性能:5折交叉验证(5-fold cross validation,CV)平衡准确率(balanced accuracy,BA)达0.71;在给定的适用域(applicability domain,AD)内,独立测试集与外部测试集的预测BA分别为0.61和0.60。基于化学描述符构建的模型也表现出相近的预测性能:5折CV BA为0.66,适用域内测试集预测BA为0.56,适用域内外部测试集预测BA为0.50。综上,本研究提出的混合描述符与化学描述符基模型,可作为药物发现阶段化合物优先级筛选过程中,过滤肝毒性分子的新型工具。
创建时间:
2020-09-10



