Re-Fraction: A Machine Learning Approach for Deterministic Identification of Protein Homologues and Splice Variants in Large-scale MS-based Proteomics
收藏NIAID Data Ecosystem2026-03-07 收录
下载链接:
https://figshare.com/articles/dataset/Re_Fraction_A_Machine_Learning_Approach_for_Deterministic_Identification_of_Protein_Homologues_and_Splice_Variants_in_Large_scale_MS_based_Proteomics/2525872
下载链接
链接失效反馈官方服务:
资源简介:
A key step in the analysis of mass spectrometry (MS)-based
proteomics
data is the inference of proteins from identified peptide sequences.
Here we describe Re-Fraction, a novel machine learning algorithm that
enhances deterministic protein identification. Re-Fraction utilizes
several protein physical properties to assign proteins to expected
protein fractions that comprise large-scale MS-based proteomics data.
This information is then used to appropriately assign peptides to
specific proteins. This approach is sensitive, highly specific, and
computationally efficient. We provide algorithms and source code for
the current version of Re-Fraction, which accepts output tables from
the MaxQuant environment. Nevertheless, the principles behind Re-Fraction
can be applied to other protein identification pipelines where data
are generated from samples fractionated at the protein level. We demonstrate
the utility of this approach through reanalysis of data from a previously
published study and generate lists of proteins deterministically identified
by Re-Fraction that were previously only identified as members of
a protein group. We find that this approach is particularly useful
in resolving protein groups composed of splice variants and homologues,
which are frequently expressed in a cell- or tissue-specific manner
and may have important biological consequences.
基于质谱(MS)的蛋白质组学数据分析的关键步骤之一,是从已鉴定的肽序列中推断蛋白质。本文介绍一款可提升确定性蛋白质鉴定效果的新型机器学习算法Re-Fraction。Re-Fraction利用多种蛋白质物理性质,将蛋白质分配至构成大规模基于质谱蛋白质组学数据集的预期蛋白质组分中。随后依托该信息,可将肽段合理匹配至特定蛋白质。该方法灵敏度高、特异性强且计算效率优异。我们已提供当前版本Re-Fraction的算法与源代码,该工具可接收MaxQuant环境生成的输出表格。尽管如此,Re-Fraction的核心原理同样适用于其他从蛋白质水平分级的样本中生成数据的蛋白质鉴定流程。我们通过重新分析已发表研究的数据集验证了该方法的实用性,并生成了由Re-Fraction确定性鉴定得到的蛋白质列表——这类蛋白质此前仅被归类为某蛋白质群组的成员。研究发现,该方法在解析由剪接变体与同源蛋白构成的蛋白质群组时尤为实用;这类变体和蛋白通常以细胞或组织特异性方式表达,并可能产生重要的生物学效应。
创建时间:
2016-02-21



