Functional Annotation of the Human Chromosome 7 “Missing” Proteins: A Bioinformatics Approach

NIAID Data Ecosystem2026-03-07 收录

下载链接：

https://figshare.com/articles/dataset/Functional_Annotation_of_the_Human_Chromosome_7_Missing_Proteins_A_Bioinformatics_Approach/2408473

下载链接

链接失效反馈

官方服务：

资源简介：

The chromosome-centric human proteome project aims to systematically map all human proteins, chromosome by chromosome, in a gene-centric manner through dedicated efforts from national and international teams. This mapping will lead to a knowledge-based resource defining the full set of proteins encoded in each chromosome and laying the foundation for the development of a standardized approach to analyze the massive proteomic data sets currently being generated. The neXtProt database lists 946 proteins as the human proteome of chromosome 7. However, 170 (18%) proteins of human chromosome 7 have no evidence at the proteomic, antibody, or structural levels and are considered “missing” in this study as they lack experimental support. We have developed a protocol for the functional annotation of these “missing” proteins by integrating several bioinformatics analysis and annotation tools, sequential BLAST homology searches, protein domain/motif and gene ontology (GO) mapping, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Using the BLAST search strategy, homologues for reviewed non-human mammalian proteins with protein evidence were identified for 90 “missing” proteins while another 38 had reviewed non-human mammalian homologues. Putative functional annotations were assigned to 27 of the remaining 43 novel proteins. Proteotypic peptides have been computationally generated to facilitate rapid identification of these proteins. Four of the “missing” chromosome 7 proteins have been substantiated by the ENCODE proteogenomic peptide data.

以染色体为中心的人类蛋白质组计划（chromosome-centric human proteome project）旨在通过各国及国际团队的专项研究，以基因为中心、逐染色体开展人类所有蛋白质的系统性图谱绘制。该工作将构建一套基于知识的资源库，明确每条染色体编码的全套蛋白质，并为开发标准化分析方法以处理当前持续产生的海量蛋白质组数据集奠定基础。neXtProt数据库已将946种蛋白质列为7号染色体的人类蛋白质组。然而，7号染色体的170种蛋白质（占比18%）在蛋白质组学、抗体或结构层面均无相关实验证据，因此在本研究中被归类为“缺失蛋白质”。本研究开发了一套针对这类“缺失蛋白质”的功能注释流程，整合了多项生物信息学分析与注释工具、递进式BLAST同源序列搜索、蛋白质结构域/基序分析、基因本体（Gene Ontology，GO）注释映射，以及京都基因与基因组百科全书（Kyoto Encyclopedia of Genes and Genomes，KEGG）通路分析。借助BLAST搜索策略，研究团队为90种“缺失蛋白质”找到了带有蛋白质证据的已注释非人类哺乳动物同源蛋白，另有38种“缺失蛋白质”可匹配到已注释的非人类哺乳动物同源蛋白。针对剩余43种新发现蛋白质中的27种，研究人员赋予了推测性功能注释。研究人员通过计算生成了蛋白质特征肽（Proteotypic peptides），以助力这些蛋白质的快速鉴定。4种7号染色体“缺失蛋白质”已通过ENCODE蛋白质基因组学肽段数据得到验证。

创建时间：

2016-02-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集