An integrated MS data processing strategy for fast identification, in-depth and reproducible quantification of protein O-glycosylation in large cohorts of human urine samples

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://www.omicsdi.org/dataset/pride/PXD015987

下载链接

链接失效反馈

官方服务：

资源简介：

Protein O-glycosylation has long been recognized to be closely associated with many diseases, particularly with tumor proliferation, invasion and metastasis. The ability to efﬁciently proﬁle the variation of O-glycosylation in large-scale clinical samples provides an important approach for the development of biomarkers for cancer diagnosis and for therapeutic response evaluation. Therefore, mass spectrometry (MS)-based techniques for high throughput, in-depth and reliable elucidation of protein O-glycosylation in large clinical cohorts are in high demand. However, the wide existence of serine and threonine residues in the proteome and the tens of mammalian O-glycan types lead to extremely large searching space composed of millions of theoretical combinations of peptides and O-glycans for intact O-glycopeptide database searching. As a result, exceptionally long time is required for database searching which is a major obstacle in O-glycoproteome studies of large clinical cohorts. More importantly, due to the low abundance and poor ionization of intact O-glycopeptides and the stochastic nature of data-dependent MS2 acquisition, substantially elevated missing data levels are inevitable as the sample number increases, which undermines the quantitative comparison across samples. Therefore, we report a new MS data processing strategy that integrates glycoform-specific database searching, reference library-based MS1 feature matching and MS2 identification propagation for fast identification, in-depth and reproducible label-free quantification of O-glycosylation of human urinary proteins. This strategy increases the database searching speeds by up to 20-fold and leads to a 30-40% enhanced intact O-glycopeptide quantification in individual samples with an obviously improved reproducibility. In total, we obtained quantitative information for 1068 intact O-glycopeptides across 36 healthy human urine samples with a 30-40% reduction in the amount of missing data. This is currently the largest dataset of urinary O-glycoproteome and demonstrates the application potential of this new strategy in large-scale clinical investigations.

蛋白质O-糖基化（Protein O-glycosylation）长期以来被证实与多种疾病密切相关，尤其与肿瘤增殖、侵袭及转移关系紧密。对大规模临床样本中O-糖基化的变化进行高效表征，可为癌症诊断生物标志物的开发以及治疗响应评估提供重要途径。因此，当前亟需能够对大型临床队列中的蛋白质O-糖基化进行高通量、深度且可靠解析的基于质谱法（mass spectrometry, MS）的技术。然而，蛋白质组中丝氨酸与苏氨酸残基的广泛分布，以及数十种哺乳动物O-聚糖类型，使得完整O-糖肽（O-glycopeptide）的数据库检索所需的肽段与O-聚糖理论组合规模高达数百万，形成极为庞大的检索空间。由此导致数据库检索耗时极长，这成为大型临床队列O-糖蛋白质组学研究的主要瓶颈。更重要的是，由于完整O-糖肽丰度低、电离效果差，且数据依赖型MS2采集具有随机性，随着样本数量增加，数据缺失率大幅上升在所难免，严重破坏样本间的定量比较。为此，本研究提出一种全新的质谱数据处理策略，该策略整合了糖型特异性数据库检索（glycoform-specific database searching）、基于参考文库的MS1特征匹配（reference library-based MS1 feature matching）以及MS2鉴定传播（MS2 identification propagation），可实现人类尿蛋白O-糖基化的快速鉴定、深度表征与可重复的无标记定量（label-free quantification）。该策略可将数据库检索速度提升最高达20倍，单个样本中的完整O-糖肽定量覆盖度提升30%~40%，且重现性显著改善。最终，我们在36份健康人类尿液样本中，共获得1068条完整O-糖肽的定量信息，数据缺失率降低30%~40%。这是目前规模最大的尿液O-糖蛋白质组数据集，证实了该新策略在大规模临床研究中的应用潜力。

创建时间：

2020-02-04