iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach

Figshare2016-01-19 更新2026-04-29 收录

下载链接：

https://figshare.com/articles/dataset/iDrug_Target_predicting_the_interactions_between_drug_compounds_and_target_proteins_in_cellular_networking_via_benchmark_dataset_optimization_approach/1289323

下载链接

链接失效反馈

官方服务：

资源简介：

Information about the interactions of drug compounds with proteins in cellular networking is very important for drug development. Unfortunately, all the existing predictors for identifying drug–protein interactions were trained by a skewed benchmark data-set where the number of non-interactive drug–protein pairs is overwhelmingly larger than that of the interactive ones. Using this kind of highly unbalanced benchmark data-set to train predictors would lead to the outcome that many interactive drug–protein pairs might be mispredicted as non-interactive. Since the minority interactive pairs often contain the most important information for drug design, it is necessary to minimize this kind of misprediction. In this study, we adopted the neighborhood cleaning rule and synthetic minority over-sampling technique to treat the skewed benchmark datasets and balance the positive and negative subsets. The new benchmark datasets thus obtained are called the optimized benchmark datasets, based on which a new predictor called iDrug-Target was developed that contains four sub-predictors: iDrug-GPCR, iDrug-Chl, iDrug-Ezy, and iDrug-NR, specialized for identifying the interactions of drug compounds with GPCRs (G-protein-coupled receptors), ion channels, enzymes, and NR (nuclear receptors), respectively. Rigorous cross-validations on a set of experiment-confirmed datasets have indicated that these new predictors remarkably outperformed the existing ones for the same purpose. To maximize users’ convenience, a public accessible Web server for iDrug-Target has been established at http://www.jci-bioinfo.cn/iDrug-Target/, by which users can easily get their desired results. It has not escaped our notice that the aforementioned strategy can be widely used in many other areas as well.

药物化合物与细胞网络中蛋白质的相互作用信息，对于药物研发而言至关重要。遗憾的是，当前所有用于识别药物-蛋白质相互作用的预测器，均采用偏斜基准数据集开展训练——这类数据集中非相互作用的药物-蛋白质对数量，远超相互作用的药物-蛋白质对。使用这类高度不平衡的基准数据集训练预测器，会导致大量真实存在相互作用的药物-蛋白质对被误判为非相互作用。由于占比少数的相互作用对往往承载着药物设计中最为关键的信息，因此亟需尽可能减少这类误判情况。本研究采用邻域清理规则（neighborhood cleaning rule）与合成少数类过采样技术（synthetic minority over-sampling technique），对偏斜基准数据集进行处理以平衡正负样本子集。由此得到的全新基准数据集被称为优化基准数据集，基于此开发了一款名为iDrug-Target的新型预测器，其包含四个子预测器：iDrug-GPCR、iDrug-Chl、iDrug-Ezy及iDrug-NR，分别用于识别药物化合物与G蛋白偶联受体（G-protein-coupled receptors, GPCRs）、离子通道、酶以及核受体（nuclear receptors, NR）之间的相互作用。基于一组经实验验证的数据集开展的严格交叉验证结果表明，相较于现有同类预测器，这些新型预测器的性能显著更优。为最大程度提升用户使用便利性，我们搭建了一款面向公众的iDrug-Target Web服务器，其访问地址为http://www.jci-bioinfo.cn/iDrug-Target/，用户可通过该服务器轻松获取所需结果。值得注意的是，上述策略同样可广泛应用于诸多其他研究领域。

创建时间：

2016-01-19