Supplemental Tables S1 and S2 for Combining structural modeling and deep learning to calculate the E. coli protein interactome and functional networks
收藏DataCite Commons2025-12-08 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/Supplemental_Tables_S1_and_S2_for_Combining_structural_modeling_and_deep_learning_to_calculate_the_E_coli_protein_interactome_and_functional_networks/30822977/1
下载链接
链接失效反馈官方服务:
资源简介:
We report on the integration of three methods that are computationally efficient enough to predict, on a proteome-wide scale, whether two proteins are likely to form a binary complex. The methods include PrePPI, which uses three-dimensional structure information as a basis for predictions, Topsy-Turvy, which analyzes sequences using a protein language model, and ZEPPI, which uses evolutionary information to evaluate protein-protein interfaces. We demonstrate how these methods can be integrated and validate the performance of the integrated method and its separate components at predicting <i>E. coli</i> protein-protein interactions (PPIs) through testing on the HINT high-quality literature-curated database of binary PPIs. The integrated method has better performance and identifies more high-confidence interactions than any of the component methods. The AF3Complex algorithm was used to predict the structures of 374 PPIs with a large fraction having at least partially overlapping interfaces with PrePPI models of the same complex. Finally, we clustered the high-confidence <i>E. coli</i> interactome and obtained 385 subnetworks which have high functional coherence defined by enrichment of Gene Ontology Biological Process terms, thus, illustrating that our methods, which contain no explicit functional information, provide biologically meaningful PPIs. Biological insights derived from the subnetworks, including the annotation of proteins of unknown function, are discussed in detail. The functional insights obtained from structure-based PPI predictions highlight the applicability of the comprehensive <i>E. coli</i> interactome presented here.
本研究报道了三种计算效率优异的方法的整合方案,可在全蛋白质组尺度下预测两种蛋白质是否可能形成二元复合物。这三种方法分别为:以三维结构信息作为预测依据的PrePPI、借助蛋白质语言模型分析序列的Topsy-Turvy,以及利用进化信息评估蛋白质-蛋白质相互作用界面的ZEPPI。我们展示了该整合方案的实现方式,并通过在经文献手工整理的高质量HINT二元蛋白质-蛋白质相互作用(Protein-protein interactions, PPIs)数据库上开展测试,验证了整合方法及其各独立组分在预测大肠杆菌(E. coli)蛋白质-蛋白质相互作用方面的性能。相较于任一单一组分方法,该整合方法具备更优异的预测性能,可识别出更多高可信度的相互作用对。本研究使用AF3Complex算法对374个PPIs的结构进行预测,其中多数复合物的相互作用界面至少与对应PrePPI模型的界面存在部分重叠。最后,我们对高可信度的大肠杆菌相互作用组进行聚类,得到385个功能子网;这些子网具备较高的功能一致性,其一致性通过基因本体(Gene Ontology)生物学过程术语的富集分析得以验证,由此表明,尽管本研究所用方法未包含显式功能信息,但其所预测得到的PPIs具备生物学意义。本研究详细讨论了从这些子网中获得的生物学洞见,包括对功能未知蛋白质的注释工作。从基于结构的PPIs预测中获得的功能洞见,凸显了本文所呈现的完整大肠杆菌相互作用组的应用价值。
提供机构:
figshare
创建时间:
2025-12-08



