A dataset for predicting protein-protein interactions in humans
收藏DataCite Commons2026-01-28 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.15dv41p84
下载链接
链接失效反馈官方服务:
资源简介:
Protein-protein interactions (PPIs) are fundamental to biological
function. While recent advances in coevolutionary analysis and deep
learning (DL)-based structure prediction have enabled large-scale PPI
identification in bacterial and yeast proteomes, their application to the
more complex human proteome has remained limited. To address this
challenge, we 1) enhanced coevolutionary signals by generating 7-fold
deeper multiple sequence alignments (MSAs) from 30 petabytes of
unassembled genomic data, and 2) developed a new DL model trained on
augmented datasets of domain-domain interactions derived from 200 million
predicted protein structures. These improvements led to a 4-fold increase
in the performance of our de novo PPI prediction pipeline for human
proteins. We systematically screened around 190 million human protein
pairs and predicted 17,849 high-confidence PPIs at an estimated precision
of 90%, including 3,631 interactions not previously detected by
experimental methods. The resulting dataset includes omicsMSA alignments,
training data (domain-domain and protein-protein interactions),
high-confidence predicted pairs, oligomeric assemblies inferred from
predicted and known interactions, novel components predicted for known
complexes, structural models (PDB format), contact probabilities from
AlphaFold and RoseTTAFold2-PPI, and DCA scores.
提供机构:
Dryad
创建时间:
2025-09-16



