Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Identification_and_Validation_of_Human_Missing_Proteins_and_Peptides_in_Public_Proteome_Databases_Data_Mining_Strategy/5557912
下载链接
链接失效反馈官方服务:
资源简介:
In an attempt to complete
human proteome project (HPP), Chromosome-Centric Human Proteome Project
(C-HPP) launched the journey of missing protein (MP) investigation
in 2012. However, 2579 and 572 protein entries in the neXtProt (2017-1)
are still considered as missing and uncertain proteins, respectively.
Thus, in this study, we proposed a pipeline to analyze, identify,
and validate human missing and uncertain proteins in open-access transcriptomics
and proteomics databases. Analysis of RNA expression pattern for missing
proteins in Human protein Atlas showed that 28% of them, such as Olfactory
receptor 1I1 (O60431), had no RNA expression, suggesting the necessity to consider uncommon
tissues for transcriptomic and proteomic studies. Interestingly, 21%
had elevated expression level in a particular tissue (tissue-enriched
proteins), indicating the importance of targeting such proteins in
their elevated tissues. Additionally, the analysis of RNA expression
level for missing proteins showed that 95% had no or low expression
level (0–10 transcripts per million), indicating that low abundance
is one of the major obstacles facing the detection of missing proteins.
Moreover, missing proteins are predicted to generate fewer predicted
unique tryptic peptides than the identified proteins. Searching for
these predicted unique tryptic peptides that correspond to missing
and uncertain proteins in the experimental peptide list of open-access
MS-based databases (PA, GPM) resulted in the detection of 402 missing
and 19 uncertain proteins with at least two unique peptides (≥9
aa) at <(5 × 10–4)% FDR. Finally, matching
the native spectra for the experimentally detected peptides with their
SRMAtlas synthetic counterparts at three transition sources (QQQ,
QTOF, QTRAP) gave us an opportunity to validate 41 missing proteins
by ≥2 proteotypic peptides.
创建时间:
2017-10-31



