MScDB: A Mass Spectrometry-centric Protein Sequence Database for Proteomics
收藏NIAID Data Ecosystem2026-03-07 收录
下载链接:
https://figshare.com/articles/dataset/MScDB_A_Mass_Spectrometry_centric_Protein_Sequence_Database_for_Proteomics/2408887
下载链接
链接失效反馈官方服务:
资源简介:
Protein sequence databases are indispensable
tools for life science
research including mass spectrometry (MS)-based proteomics. In current
database construction processes, sequence similarity clustering is
used to reduce redundancies in the source data. Albeit powerful, it
ignores the peptide-centric nature of proteomic data and the fact
that MS is able to distinguish similar sequences. Therefore, we introduce
an approach that structures the protein sequence space at the peptide
level using theoretical and empirical information from large-scale
proteomic data to generate a mass spectrometry-centric protein sequence
database (MScDB). The core modules of MScDB are an in-silico proteolytic digest and a peptide-centric clustering algorithm that
groups protein sequences that are indistinguishable by mass spectrometry.
Analysis of various MScDB uses cases against five complex human proteomes,
resulting in 69 peptide identifications not present in UniProtKB as
well as 79 putative single amino acid polymorphisms. MScDB retains
∼99% of the identifications in comparison to common databases
despite a 3–48% increase in the theoretical peptide search
space (but comparable protein sequence space). In addition, MScDB
enables cross-species applications such as human/mouse graft models,
and our results suggest that the uncertainty in protein assignments
to one species can be smaller than 20%.
创建时间:
2016-02-19



