Training Based on Ligand Efficiency Improves Prediction of Bioactivities of Ligands and Drug Target Proteins in a Machine Learning Approach
收藏NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/Training_Based_on_Ligand_Efficiency_Improves_Prediction_of_Bioactivities_of_Ligands_and_Drug_Target_Proteins_in_a_Machine_Learning_Approach/2362369
下载链接
链接失效反馈官方服务:
资源简介:
Machine
learning methods based on ligand–protein interaction data in
bioactivity databases are one of the current strategies for efficiently
finding novel lead compounds as the first step in the drug discovery
process. Although previous machine learning studies have succeeded
in predicting novel ligand–protein interactions with high performance,
all of the previous studies to date have been heavily dependent on
the simple use of raw bioactivity data of ligand potencies measured
by IC50, EC50, Ki, and Kd deposited in databases. ChEMBL
provides us with a unique opportunity to investigate whether a machine-learning-based
classifier created by reflecting ligand efficiency other than the
IC50, EC50, Ki,
and Kd values can also offer high predictive
performance. Here we report that classifiers created from training
data based on ligand efficiency show higher performance than those
from data based on IC50 or Ki values. Utilizing GPCRSARfari and KinaseSARfari databases in ChEMBL,
we created IC50- or Ki-based
training data and binding efficiency index (BEI) based training data
then constructed classifiers using support vector machines (SVMs).
The SVM classifiers from the BEI-based training data showed slightly
higher area under curve (AUC), accuracy, sensitivity, and specificity
in the cross-validation tests. Application of the classifiers to the
validation data demonstrated that the AUCs and specificities of the
BEI-based classifiers dramatically increased in comparison with the
IC50- or Ki-based classifiers.
The improvement of the predictive power by the BEI-based classifiers
can be attributed to (i) the more separated distributions of positives
and negatives, (ii) the higher diversity of negatives in the BEI-based
training data in a feature space of SVMs, and (iii) a more balanced
number of positives and negatives in the BEI-based training data.
These results strongly suggest that training data based on ligand
efficiency as well as data based on classical IC50, EC50, Kd, and Ki values are important when creating a classifier using a machine
learning approach based on bioactivity data.
创建时间:
2013-10-28



