five

An integrated MS data processing strategy for fast identification, in-depth and reproducible quantification of protein O-glycosylation in large cohorts of human urine samples

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://www.omicsdi.org/dataset/pride/PXD015987
下载链接
链接失效反馈
官方服务:
资源简介:
Protein O-glycosylation has long been recognized to be closely associated with many diseases, particularly with tumor proliferation, invasion and metastasis. The ability to efficiently profile the variation of O-glycosylation in large-scale clinical samples provides an important approach for the development of biomarkers for cancer diagnosis and for therapeutic response evaluation. Therefore, mass spectrometry (MS)-based techniques for high throughput, in-depth and reliable elucidation of protein O-glycosylation in large clinical cohorts are in high demand. However, the wide existence of serine and threonine residues in the proteome and the tens of mammalian O-glycan types lead to extremely large searching space composed of millions of theoretical combinations of peptides and O-glycans for intact O-glycopeptide database searching. As a result, exceptionally long time is required for database searching which is a major obstacle in O-glycoproteome studies of large clinical cohorts. More importantly, due to the low abundance and poor ionization of intact O-glycopeptides and the stochastic nature of data-dependent MS2 acquisition, substantially elevated missing data levels are inevitable as the sample number increases, which undermines the quantitative comparison across samples. Therefore, we report a new MS data processing strategy that integrates glycoform-specific database searching, reference library-based MS1 feature matching and MS2 identification propagation for fast identification, in-depth and reproducible label-free quantification of O-glycosylation of human urinary proteins. This strategy increases the database searching speeds by up to 20-fold and leads to a 30-40% enhanced intact O-glycopeptide quantification in individual samples with an obviously improved reproducibility. In total, we obtained quantitative information for 1068 intact O-glycopeptides across 36 healthy human urine samples with a 30-40% reduction in the amount of missing data. This is currently the largest dataset of urinary O-glycoproteome and demonstrates the application potential of this new strategy in large-scale clinical investigations.

蛋白质O-糖基化(Protein O-glycosylation)长期以来被证实与多种疾病密切相关,尤其与肿瘤增殖、侵袭及转移关系紧密。对大规模临床样本中O-糖基化的变化进行高效表征,可为癌症诊断生物标志物的开发以及治疗响应评估提供重要途径。因此,当前亟需能够对大型临床队列中的蛋白质O-糖基化进行高通量、深度且可靠解析的基于质谱法(mass spectrometry, MS)的技术。然而,蛋白质组中丝氨酸与苏氨酸残基的广泛分布,以及数十种哺乳动物O-聚糖类型,使得完整O-糖肽(O-glycopeptide)的数据库检索所需的肽段与O-聚糖理论组合规模高达数百万,形成极为庞大的检索空间。由此导致数据库检索耗时极长,这成为大型临床队列O-糖蛋白质组学研究的主要瓶颈。更重要的是,由于完整O-糖肽丰度低、电离效果差,且数据依赖型MS2采集具有随机性,随着样本数量增加,数据缺失率大幅上升在所难免,严重破坏样本间的定量比较。为此,本研究提出一种全新的质谱数据处理策略,该策略整合了糖型特异性数据库检索(glycoform-specific database searching)、基于参考文库的MS1特征匹配(reference library-based MS1 feature matching)以及MS2鉴定传播(MS2 identification propagation),可实现人类尿蛋白O-糖基化的快速鉴定、深度表征与可重复的无标记定量(label-free quantification)。该策略可将数据库检索速度提升最高达20倍,单个样本中的完整O-糖肽定量覆盖度提升30%~40%,且重现性显著改善。最终,我们在36份健康人类尿液样本中,共获得1068条完整O-糖肽的定量信息,数据缺失率降低30%~40%。这是目前规模最大的尿液O-糖蛋白质组数据集,证实了该新策略在大规模临床研究中的应用潜力。
创建时间:
2020-02-04
二维码
社区交流群
二维码
科研交流群
商业服务