five

The Development of Target-Specific Machine Learning Models as Scoring Functions for Docking-Based Target Prediction

收藏
Figshare2019-03-18 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/The_Development_of_Target-Specific_Machine_Learning_Models_as_Scoring_Functions_for_Docking-Based_Target_Prediction/7857554
下载链接
链接失效反馈
官方服务:
资源简介:
The identification of possible targets for a known bioactive compound is of the utmost importance for drug design and development. Molecular docking is one possible approach for in-silico protein target prediction, whereas a molecule is docked into several different protein structures to identify potential targets. This reverse docking approach is hampered by the limitation of current scoring functions to correctly discriminate between targets and nontargets. In this work, a development of target-specific scoring functions is described that showed improved prediction performances for the correct target prediction of both actives and decoys on three validation data sets. In contrast to pure ligand-based approaches, that are in general faster and include a greater target space, docking-based approaches can cover also unknown chemical space that lies outside the known bioactivity data. These target-specific scoring functions are based on known bioactivity data retrieved from ChEMBL and supervised machine learning approaches. Neural Networks and Support Vector Machines (SVMs) models were trained for 20 different protein targets. Our protein–ligand interaction fingerprint PADIF (Protein Atom Score Contributions Derived Interaction Fingerprint) represents the input for training, whereas the PADIFs are calculated based on docking poses of active and inactive compounds. Different data sets of previously unseen molecules were used for the final evaluation and analysis of the prediction performance of the created models. For a single-target selectivity data set, the correct target model returns in most of the cases the highest probabilities scores for their active molecules and with statistically significant differences from the other targets. These probability scores were also predicted and successfully used to rank the targets for molecules of a multitarget data set with activity data described simultaneously for two, three, and four to seven protein targets.
创建时间:
2019-03-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作