FuzzyPPI: Human Proteome at Fuzzy Semantic Space
收藏DataCite Commons2026-03-27 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/FuzzyPPI_Human_Proteome_at_Fuzzy_SemanticSpace/15439980
下载链接
链接失效反馈官方服务:
资源简介:
Large scale protein-protein interaction (PPI) network of an organism provides key insights into the cellular and molecular functionalities, signaling pathways and underlying disease mechanisms. If we consider the complete interactome of any given organism, the total number of unexplored protein interactions significantly outnumbers the known positive and negative interactions. For Human 20,350 reviewed proteins can generate over ~207 million potential interactions. However, the combination of all known PPI datasets, contains only ~5.6 million positive and ~758k negative protein-protein interactions (NPPI), that together is ~3.1% what is more, conventional PPI prediction methods produce binary results. At the same time recent studies show that protein binding affinities may prove to be effective in detecting protein complexes, disease association analysis, signaling network reconstruction, <i>etc.</i> In this work we present a fuzzy semantic scoring function using the Gene Ontology (GO) graphs to assess the binding affinity between any two proteins at an organism level. We have implemented a distributed algorithm in Apache Spark that computes this function and processed the complete Human PPI network of ~182 million potential interactions resulting from 19,106 reviewed proteins for which GO annotations are available. The quality of the computed scores has been validated with respect to the available <i>state-of-the-art</i> methods on benchmark data sets.
生物体的大规模蛋白质-蛋白质相互作用(Protein-Protein Interaction, PPI)网络,可为解析细胞与分子功能、信号通路及潜在疾病机制提供关键洞见。对于任意给定生物体的完整相互作用组而言,未被探明的蛋白质相互作用数量显著多于已确认的阳性与阴性相互作用。以人类为例,20350条已评审的蛋白质可产生逾2.07亿种潜在相互作用。然而,当前所有已知PPI数据集的整合集合仅包含约560万条阳性相互作用与约75.8万条阴性蛋白质-蛋白质相互作用(Negative Protein-Protein Interaction, NPPI),二者总和仅占潜在相互作用组的约3.1%。此外,传统PPI预测方法仅输出二元分类结果。与此同时,近期研究表明,蛋白质结合亲和力可有效应用于蛋白质复合物检测、疾病关联分析、信号网络重构等诸多场景。本研究提出一种基于基因本体(Gene Ontology, GO)图谱的模糊语义评分函数,用于评估生物体水平下任意两个蛋白质之间的结合亲和力。我们基于Apache Spark实现了用于计算该评分函数的分布式算法,并针对19106个带有GO注释的已评审人类蛋白质所产生的约1.82亿条潜在相互作用,完成了完整人类PPI网络的处理。所计算得到的评分质量,已通过基准数据集上的现有前沿顶尖方法完成验证。
提供机构:
figshare
创建时间:
2021-08-19



