Efficient multi-task chemogenomics for drug specificity prediction

NIAID Data Ecosystem2026-03-10 收录

下载链接：

https://figshare.com/articles/dataset/Efficient_multi-task_chemogenomics_for_drug_specificity_prediction/7168718

下载链接

链接失效反馈

官方服务：

资源简介：

Adverse drug reactions, also called side effects, range from mild to fatal clinical events and significantly affect the quality of care. Among other causes, side effects occur when drugs bind to proteins other than their intended target. As experimentally testing drug specificity against the entire proteome is out of reach, we investigate the application of chemogenomics approaches. We formulate the study of drug specificity as a problem of predicting interactions between drugs and proteins at the proteome scale. We build several benchmark datasets, and propose NN-MT, a multi-task Support Vector Machine (SVM) algorithm that is trained on a limited number of data points, in order to solve the computational issues or proteome-wide SVM for chemogenomics. We compare NN-MT to different state-of-the-art methods, and show that its prediction performances are similar or better, at an efficient calculation cost. Compared to its competitors, the proposed method is particularly efficient to predict (protein, ligand) interactions in the difficult double-orphan case, i.e. when no interactions are previously known for the protein nor for the ligand. The NN-MT algorithm appears to be a good default method providing state-of-the-art or better performances, in a wide range of prediction scenario that are considered in the present study: proteome-wide prediction, protein family prediction, test (protein, ligand) pairs dissimilar to pairs in the train set, and orphan cases.

药物不良反应（Adverse Drug Reactions，简称ADRs）亦称副作用，涵盖从轻度至致命的各类临床事件，会显著影响医疗照护质量。在诸多诱因中，当药物与非预期靶点的蛋白质（protein）结合时，便会引发副作用。由于通过实验测试药物对完整蛋白质组（proteome）的特异性并不可行，我们探索了化学基因组学（chemogenomics）方法的应用。我们将药物特异性研究建模为在蛋白质组规模下预测药物与蛋白质之间相互作用的问题。我们构建了多个基准数据集，并提出了NN-MT——一种基于有限数据点训练的多任务支持向量机（Support Vector Machine，SVM）算法，以解决化学基因组学中蛋白质组规模SVM的计算难题。我们将NN-MT与多种当前主流前沿方法进行对比，结果表明其预测性能与主流方法相当甚至更优，且计算成本更为高效。相较于其他竞争方法，所提方法在预测（蛋白质，配体（ligand））相互作用的双孤儿（double-orphan）场景中尤为高效——即当该蛋白质与该配体此前均无已知相互作用时。在本研究涵盖的各类预测场景中，NN-MT算法均展现出了优异的默认方法适用性，其预测性能达到甚至超越当前主流前沿水平，这些场景包括：蛋白质组规模预测、蛋白质家族预测、与训练集样本对差异较大的测试（蛋白质，配体）样本对，以及孤儿案例。

创建时间：

2018-10-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集