Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://www.omicsdi.org/dataset/pride/PXD018043

下载链接

链接失效反馈

官方服务：

资源简介：

Top-down mass spectrometry (MS) is a powerful tool for identification and comprehensive characterization of proteoforms arising from alternative splicing, sequence variation, and post-translational modifications. While the technique is powerful, it suffered from the complex dataset generated from top-down MS experiments, which requires sequential data processing steps for data interpretation. Deconvolution of the complex isotopic distribution that arises from naturally occurring isotopes is a critical step in the data processing process. Multiple algorithms are currently available to deconvolute top-down mass spectra; however, each algorithm generates different deconvoluted peak lists with varied accuracy comparing to true positive annotations. In this study, we have designed a machine learning strategy that can process and combine the peak lists from different deconvolution results. By optimizing clustering results, deconvolution results from THRASH, TopFD, MS-Deconv, and SNAP algorithms were combined into consensus peak lists at various thresholds using either a simple voting ensemble method or a random forest machine learning algorithm. The random forest model outperformed the single best algorithm. This machine learning strategy could enhance the accuracy and confidence in protein identification during database search by accelerating detection of true positive peaks while filtering out false positive peaks. Thus, this method showed promises in enhancing proteoform identification and characterization for high-throughput data analysis in top-down proteomics.

创建时间：

2020-05-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集