five

Predicting Chemical-Induced Liver Toxicity Using High-Content Imaging Phenotypes and Chemical Descriptors: A Random Forest Approach

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Predicting_Chemical-Induced_Liver_Toxicity_Using_High-Content_Imaging_Phenotypes_and_Chemical_Descriptors_A_Random_Forest_Approach/12937633
下载链接
链接失效反馈
官方服务:
资源简介:
Hepatotoxicity is a major reason for the withdrawal or discontinuation of drugs from clinical trials. Thus, better tools are needed to filter potential hepatotoxic drugs early in drug discovery. Our study demonstrates utilization of HCI phenotypes, chemical descriptors, and both combined (hybrid) descriptors to construct random forest classifiers (RFCs) for the prediction of hepatotoxicity. HCI data published by Broad Institute provided HCI phenotypes for about 30 000 samples in multiple replicates. Phenotypes belonging to 346 chemicals, which were tested in up to eight replicates, were chosen as a basis for our analysis. We then constructed individual RFC models for HCI phenotypes, chemical descriptors, and hybrid (chemical and HCI) descriptors. The model that was constructed using selective hybrid descriptors showed high predictive performance with 5-fold cross validation (CV) balanced accuracy (BA) at 0.71, whereas within the given applicability domain (AD), independent test set and external test set prediction BAs were equal to 0.61 and 0.60, respectively. The model constructed using chemical descriptors showed a similar predictive performance with a 5-fold CV BA equal to 0.66, a test set prediction BA within the AD equal to 0.56, and an external test set prediction BA within the AD equal to 0.50. In conclusion, the hybrid and chemical descriptor-based models presented here should be considered as a new tool for filtering hepatotoxic molecules during compound prioritization in drug discovery.

肝毒性(Hepatotoxicity)是药物从临床试验中撤市或终止研发的主要原因。因此,亟需更高效的工具在药物发现早期阶段筛选潜在肝毒性药物。本研究利用HCI表型(HCI phenotypes)、化学描述符(chemical descriptors)以及二者组合的混合描述符(hybrid descriptors),构建随机森林分类器(random forest classifiers,RFCs)以预测肝毒性。布罗德研究所(Broad Institute)发布的HCI数据集,为约30000份多重复实验样本提供了HCI表型数据。我们选取了346种受试化合物的表型数据作为分析基础,这些化合物的实验重复次数最多可达8次。随后,我们分别基于HCI表型、化学描述符以及混合(化学+HCI)描述符构建了独立的RFC模型。采用选择性混合描述符构建的模型展现出优异的预测性能:5折交叉验证(5-fold cross validation,CV)平衡准确率(balanced accuracy,BA)达0.71;在给定的适用域(applicability domain,AD)内,独立测试集与外部测试集的预测BA分别为0.61和0.60。基于化学描述符构建的模型也表现出相近的预测性能:5折CV BA为0.66,适用域内测试集预测BA为0.56,适用域内外部测试集预测BA为0.50。综上,本研究提出的混合描述符与化学描述符基模型,可作为药物发现阶段化合物优先级筛选过程中,过滤肝毒性分子的新型工具。
创建时间:
2020-09-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作