Supporting Data for Zhao et al. "Combining structural modeling and deep learning to calculate the E. coli protein interactome and functional networks"
收藏Figshare2026-02-19 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Supporting_Data_for_Zhao_et_al_Combining_structural_modeling_and_deep_learning_to_calculate_the_i_E_coli_i_protein_interactome_and_functional_networks_/31362145
下载链接
链接失效反馈官方服务:
资源简介:
SupportingData Directory for Zhao et al. "Combining structural modeling and deep learning to calculate the E. coli protein interactome and functional networks"AbstractWe report on the integration of three methods that predict, on a proteome-wide scale, whether two proteins are likely to form a binary complex. The methods include PrePPI, which uses three-dimensional structure information as a basis for predictions, Topsy-Turvy, which uses a protein language model, and ZEPPI, which uses evolutionary information to evaluate protein-protein interfaces. Testing on the high-quality HINT database of binary PPIs reveals that the integrated method has better performance and identifies more high-confidence interactions than any of the component methods. The AF3Complex algorithm was used to predict the structures of 374 PPIs with a large fraction having at least partially overlapping interfaces with PrePPI models of the same complex. Clustering of the high-confidence E. coli interactome yields 385 subnetworks which have high functional coherence. Biological insights derived from the subnetworks, including the annotation of proteins of unknown function, are discussed in detail.ContentsThe SupportingData directory contains:One final integrated prediction fileThree main subfolders:SI TablesTrainingData_Human-PPI-InteractomesTestData_Ecoli-PPI-Interactomes1. Final Integrated Prediction Fileecoli_PPIs_LRINT_threeclues.csvThis file contains the final integrated prediction of E. coli protein–protein interactions (PPIs), which includes three individual prediction clues derived from: PrePPI, ZEPPI, D-Script (TT), and the integrated prediction score INTLR (Integrated Likelihood Ratio) generated by the Bayesian model.2. SI Tables FolderThis folder contains supplemental tables referenced in Zhao et al., 2025 (https://www.biorxiv.org/content/10.1101/2025.05.07.652715v1)Table S1Contains 374 selected PPIs predicted with high-confidence integrated likelihood ratio (LR).These interactions are considered challenging because the protein pairs exhibit low local sequence identity.Detailed selection criteria are described in the Methods section of the paper.Table S2Contains network construction, clustering, and functional annotation results.Derived from interactome clustering analysis.3. TrainingData_Human-PPI-Interactomes FolderThis folder contains datasets used to train the Bayesian model. The training datasets include:Genome-wide human PPI predictions generated by PrePPI-AF, ZEPPI, and D-Script (TT).Experimentally validated human PPIs from HINT and STRING (located in /ExptDB)Files IncludedHuman_TT_all.tsvDefault output from D-Script (TT, https://topsyturvy.csail.mit.edu). Columns are uniprotID1, uniprotID2, and the Predicted probability score.Human_ZEPPI_all.csvOutput from ZEPPI (https://github.com/honig-lab/ZEPPI). Columns used in this work are the first column (UniProtID pair) and the last column (ZEPPI score)./ExptDB/2021HINT_Human_lcb_hq.txtDownloaded from HINT (2021). Literature-curated, high-quality binary interaction dataset. Columns used for training: UniprotID_A and UniprotID_B./ExptDB/2022String_9606.physical.links.experimental.sortedpair.nonred.csvDownloaded from STRING (2022). Physical experimental PPI dataset. Column used for training: UniprotID_A_UniprotID_B./ExptDB/human_allExpDB.csvIn-house curated dataset representing the largest compiled human experimental PPI dataset to date. Constructed as the union of PPIs from:HINT (binary and co-complex)Interactome3DAPID_level0STRING_allBIOGRID_allHURI_allHigh-confidence in-house reference set (PrePPI-total 2016)4. TestData_Ecoli-PPI-Interactomes FolderThis folder contains genome-wide PPI predictions for E. coli generated by PrePPI-AF, ZEPPI, and D-Script (TT). In this study, these predictions are integrated using a Bayesian model trained on human PPI data. The integrated score reflects evidence from:Structure-based modelingProtein language modelsSequence co-evolution signalsFiles Includedecoli_PrePPI_SM_ns.csvPrediction file from PrePPI structural modeling. Column 1–2 are UniProt IDs of interacting proteins. Column 3 is the Structural modeling likelihood score.ecoli_ZEPPI_all.csvOutput from ZEPPI (https://github.com/honig-lab/ZEPPI). Columns used in this work are the first column (UniProtID pair) and the last column (ZEPPI score).ecoliK12_TT_all.tsvDefault output from D-Script (TT, https://topsyturvy.csail.mit.edu). Columns are uniprotID1, uniprotID2, and the Predicted probability score.
创建时间:
2026-02-19



