Large protein databases reveal structural complementarity and functional locality
收藏DataCite Commons2025-07-12 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/Large_protein_databases_reveal_structural_complementarity_and_functional_locality/27203073/2
下载链接
链接失效反馈官方服务:
资源简介:
Recent breakthroughs in protein structure prediction have led to an unprecedented surge in high-quality 3D models, highlighting the need for efficient computational solutions to manage and analyze this wealth of structural data. In our work, we comprehensively examine the structural clusters obtained from the AlphaFold Protein Structure Database (AFDB), a high-quality subset of ESMAtlas, and the Microbiome Immunity Project (MIP). We create a single cohesive low-dimensional representation of the resulting protein space. Our results show that, while each database occupies distinct regions within the protein structure space, they collectively exhibit significant overlap in their functional profiles. High-level biological functions tend to cluster in particular regions, revealing a shared functional landscape despite the diverse sources of data. By creating a single, cohesive low-dimensional representation of protein structure space integrating data from diverse sources, localizing functional annotations within this space, and providing an open-access web-server for exploration, this work offers insights for future research concerning protein sequence-structure-function relationships, enabling various biological questions to be asked about taxonomic assignments, environmental factors, or functional specificity. This approach is generalizable to other or future datasets, enabling further discovery beyond findings presented here.
蛋白质结构预测领域的近期突破,催生了高质量三维结构模型的空前增长,凸显了开发高效计算方案以管理和分析海量结构数据的迫切需求。本研究全面分析了源自AlphaFold蛋白质结构数据库(AlphaFold Protein Structure Database, AFDB)、ESMAtlas高质量子集以及微生物组免疫计划(Microbiome Immunity Project, MIP)的结构聚类结果。我们构建了整合所得蛋白质空间的统一连贯低维表征。研究结果表明,尽管各数据库在蛋白质结构空间中占据不同区域,但它们的功能谱整体存在显著重叠。高阶生物学功能往往在特定区域聚集,这表明尽管数据来源各异,却存在共通的功能图谱。本研究通过构建整合多源数据的蛋白质结构空间统一低维表征、在该空间内定位功能注释,并提供开放获取的探索性Web服务器,为未来蛋白质序列-结构-功能关联相关研究提供了新视角,使得研究者可针对分类学归属、环境因素或功能特异性提出各类生物学问题。该方法可推广至其他现有或未来的数据集,助力在本研究成果之外开展更多新发现。
提供机构:
figshare
创建时间:
2025-05-07



