Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics
收藏NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://figshare.com/articles/dataset/Tiered_Human_Integrated_Sequence_Search_Databases_for_Shotgun_Proteomics/3822615
下载链接
链接失效反馈官方服务:
资源简介:
The
results of analysis of shotgun proteomics mass spectrometry
data can be greatly affected by the selection of the reference protein
sequence database against which the spectra are matched. For many
species there are multiple sources from which somewhat different sequence
sets can be obtained. This can lead to confusion about which database
is best in which circumstancesa problem especially acute in
human sample analysis. All sequence databases are genome-based, with
sequences for the predicted gene and their protein translation products
compiled. Our goal is to create a set of primary sequence databases
that comprise the union of sequences from many of the different available
sources and make the result easily available to the community. We
have compiled a set of four sequence databases of varying sizes, from
a small database consisting of only the ∼20,000 primary isoforms
plus contaminants to a very large database that includes almost all
nonredundant protein sequences from several sources. This set of tiered,
increasingly complete human protein sequence databases suitable for
mass spectrometry proteomics sequence database searching is called
the Tiered Human Integrated Search Proteome set. In order to evaluate
the utility of these databases, we have analyzed two different data
sets, one from the HeLa cell line and the other from normal human
liver tissue, with each of the four tiers of database complexity.
The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides
can be identified for Tiers 2, 3, and 4, respectively, as compared
with the Tier 1 database, at substantially increasing computational
cost. This increase in computational cost may be worth bearing if
the identification of sequence variants or the discovery of sequences
that are not present in the reviewed knowledge base entries is an
important goal of the study. We find that it is useful to search a
data set against a simpler database, and then check the uniqueness
of the discovered peptides against a more complex database. We have
set up an automated system that downloads all the source databases
on the first of each month and automatically generates a new set of
search databases and makes them available for download at http://www.peptideatlas.org/thisp/.
创建时间:
2016-10-31



