Training Based on Ligand Efficiency Improves Prediction of Bioactivities of Ligands and Drug Target Proteins in a Machine Learning Approach

NIAID Data Ecosystem2026-03-08 收录

下载链接：

https://figshare.com/articles/dataset/Training_Based_on_Ligand_Efficiency_Improves_Prediction_of_Bioactivities_of_Ligands_and_Drug_Target_Proteins_in_a_Machine_Learning_Approach/2362369

下载链接

链接失效反馈

官方服务：

资源简介：

Machine learning methods based on ligand–protein interaction data in bioactivity databases are one of the current strategies for efficiently finding novel lead compounds as the first step in the drug discovery process. Although previous machine learning studies have succeeded in predicting novel ligand–protein interactions with high performance, all of the previous studies to date have been heavily dependent on the simple use of raw bioactivity data of ligand potencies measured by IC50, EC50, Ki, and Kd deposited in databases. ChEMBL provides us with a unique opportunity to investigate whether a machine-learning-based classifier created by reflecting ligand efficiency other than the IC50, EC50, Ki, and Kd values can also offer high predictive performance. Here we report that classifiers created from training data based on ligand efficiency show higher performance than those from data based on IC50 or Ki values. Utilizing GPCRSARfari and KinaseSARfari databases in ChEMBL, we created IC50- or Ki-based training data and binding efficiency index (BEI) based training data then constructed classifiers using support vector machines (SVMs). The SVM classifiers from the BEI-based training data showed slightly higher area under curve (AUC), accuracy, sensitivity, and specificity in the cross-validation tests. Application of the classifiers to the validation data demonstrated that the AUCs and specificities of the BEI-based classifiers dramatically increased in comparison with the IC50- or Ki-based classifiers. The improvement of the predictive power by the BEI-based classifiers can be attributed to (i) the more separated distributions of positives and negatives, (ii) the higher diversity of negatives in the BEI-based training data in a feature space of SVMs, and (iii) a more balanced number of positives and negatives in the BEI-based training data. These results strongly suggest that training data based on ligand efficiency as well as data based on classical IC50, EC50, Kd, and Ki values are important when creating a classifier using a machine learning approach based on bioactivity data.

创建时间：

2013-10-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集