five

A dataset for predicting protein-protein interactions in humans

收藏
DataCite Commons2026-01-28 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.15dv41p84
下载链接
链接失效反馈
官方服务:
资源简介:
Protein-protein interactions (PPIs) are fundamental to biological function. While recent advances in coevolutionary analysis and deep learning (DL)-based structure prediction have enabled large-scale PPI identification in bacterial and yeast proteomes, their application to the more complex human proteome has remained limited. To address this challenge, we 1) enhanced coevolutionary signals by generating 7-fold deeper multiple sequence alignments (MSAs) from 30 petabytes of unassembled genomic data, and 2) developed a new DL model trained on augmented datasets of domain-domain interactions derived from 200 million predicted protein structures. These improvements led to a 4-fold increase in the performance of our de novo PPI prediction pipeline for human proteins. We systematically screened around 190 million human protein pairs and predicted 17,849 high-confidence PPIs at an estimated precision of 90%, including 3,631 interactions not previously detected by experimental methods. The resulting dataset includes omicsMSA alignments, training data (domain-domain and protein-protein interactions), high-confidence predicted pairs, oligomeric assemblies inferred from predicted and known interactions, novel components predicted for known complexes, structural models (PDB format), contact probabilities from AlphaFold and RoseTTAFold2-PPI, and DCA scores.
提供机构:
Dryad
创建时间:
2025-09-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作