Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate
收藏NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://figshare.com/articles/dataset/Integrated_Proteomic_Pipeline_Using_Multiple_Search_Engines_for_a_Proteogenomic_Study_with_a_Controlled_Protein_False_Discovery_Rate/3793386
下载链接
链接失效反馈官方服务:
资源简介:
In
the Chromosome-Centric Human Proteome Project (C-HPP), false-positive
identification by peptide spectrum matches (PSMs) after database searches
is a major issue for proteogenomic studies using liquid-chromatography
and mass-spectrometry-based large proteomic profiling. Here we developed
a simple strategy for protein identification, with a controlled false
discovery rate (FDR) at the protein level, using an integrated proteomic
pipeline (IPP) that consists of four engrailed steps as follows. First,
using three different search engines, SEQUEST, MASCOT, and MS-GF+,
individual proteomic searches were performed against the neXtProt
database. Second, the search results from the PSMs were combined using
statistical evaluation tools including DTASelect and Percolator. Third,
the peptide search scores were converted into E-scores normalized
using an in-house program. Last, ProteinInferencer was used to filter
the proteins containing two or more peptides with a controlled FDR
of 1.0% at the protein level. Finally, we compared the performance
of the IPP to a conventional proteomic pipeline (CPP) for protein
identification using a controlled FDR of <1% at the protein level.
Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including
477 alternative splicing variants (vs 182 using the CPP) were identified
from human hippocampal tissue. In addition, a total of 10 missing
proteins (vs 7 using the CPP) were identified with two or more unique
peptides, and their tryptic peptides were validated using MS/MS spectral
pattern from a repository database or their corresponding synthetic
peptides. This study shows that the IPP effectively improved the identification
of proteins, including alternative splicing variants and missing proteins,
in human hippocampal tissues for the C-HPP. All RAW files used in
this study were deposited in ProteomeXchange (PXD000395).
创建时间:
2016-10-31



