The Development of Target-Specific Machine Learning Models as Scoring Functions for Docking-Based Target Prediction
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/The_Development_of_Target-Specific_Machine_Learning_Models_as_Scoring_Functions_for_Docking-Based_Target_Prediction/7857554
下载链接
链接失效反馈官方服务:
资源简介:
The identification
of possible targets for a known bioactive compound
is of the utmost importance for drug design and development. Molecular
docking is one possible approach for in-silico protein
target prediction, whereas a molecule is docked into several different
protein structures to identify potential targets. This reverse docking
approach is hampered by the limitation of current scoring functions
to correctly discriminate between targets and nontargets. In this
work, a development of target-specific scoring functions is described
that showed improved prediction performances for the correct target
prediction of both actives and decoys on three validation data sets.
In contrast to pure ligand-based approaches, that are in general faster
and include a greater target space, docking-based approaches can cover
also unknown chemical space that lies outside the known bioactivity
data. These target-specific scoring functions are based on known bioactivity
data retrieved from ChEMBL and supervised machine learning approaches.
Neural Networks and Support Vector Machines (SVMs) models were trained
for 20 different protein targets. Our protein–ligand interaction
fingerprint PADIF (Protein Atom Score Contributions Derived Interaction
Fingerprint) represents the input for training, whereas the PADIFs
are calculated based on docking poses of active and inactive compounds.
Different data sets of previously unseen molecules were used for the
final evaluation and analysis of the prediction performance of the
created models. For a single-target selectivity data set, the correct
target model returns in most of the cases the highest probabilities
scores for their active molecules and with statistically significant
differences from the other targets. These probability scores were
also predicted and successfully used to rank the targets for molecules
of a multitarget data set with activity data described simultaneously
for two, three, and four to seven protein targets.
创建时间:
2019-02-25



