A dataset for predicting protein-protein interactions in humans
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.15dv41p84
下载链接
链接失效反馈官方服务:
资源简介:
Protein-protein interactions (PPIs) are fundamental to biological function. While recent advances in coevolutionary analysis and deep learning (DL)-based structure prediction have enabled large-scale PPI identification in bacterial and yeast proteomes, their application to the more complex human proteome has remained limited. To address this challenge, we 1) enhanced coevolutionary signals by generating 7-fold deeper multiple sequence alignments (MSAs) from 30 petabytes of unassembled genomic data, and 2) developed a new DL model trained on augmented datasets of domain-domain interactions derived from 200 million predicted protein structures. These improvements led to a 4-fold increase in the performance of our de novo PPI prediction pipeline for human proteins. We systematically screened around 190 million human protein pairs and predicted 17,849 high-confidence PPIs at an estimated precision of 90%, including 3,631 interactions not previously detected by experimental methods. The resulting dataset includes omicsMSA alignments, training data (domain-domain and protein-protein interactions), high-confidence predicted pairs, oligomeric assemblies inferred from predicted and known interactions, novel components predicted for known complexes, structural models (PDB format), contact probabilities from AlphaFold and RoseTTAFold2-PPI, and DCA scores.
创建时间:
2025-09-16



