five

Re-Fraction: A Machine Learning Approach for Deterministic Identification of Protein Homologues and Splice Variants in Large-scale MS-based Proteomics

收藏
NIAID Data Ecosystem2026-03-07 收录
下载链接:
https://figshare.com/articles/dataset/Re_Fraction_A_Machine_Learning_Approach_for_Deterministic_Identification_of_Protein_Homologues_and_Splice_Variants_in_Large_scale_MS_based_Proteomics/2525872
下载链接
链接失效反馈
官方服务:
资源简介:
A key step in the analysis of mass spectrometry (MS)-based proteomics data is the inference of proteins from identified peptide sequences. Here we describe Re-Fraction, a novel machine learning algorithm that enhances deterministic protein identification. Re-Fraction utilizes several protein physical properties to assign proteins to expected protein fractions that comprise large-scale MS-based proteomics data. This information is then used to appropriately assign peptides to specific proteins. This approach is sensitive, highly specific, and computationally efficient. We provide algorithms and source code for the current version of Re-Fraction, which accepts output tables from the MaxQuant environment. Nevertheless, the principles behind Re-Fraction can be applied to other protein identification pipelines where data are generated from samples fractionated at the protein level. We demonstrate the utility of this approach through reanalysis of data from a previously published study and generate lists of proteins deterministically identified by Re-Fraction that were previously only identified as members of a protein group. We find that this approach is particularly useful in resolving protein groups composed of splice variants and homologues, which are frequently expressed in a cell- or tissue-specific manner and may have important biological consequences.

基于质谱(MS)的蛋白质组学数据分析的关键步骤之一,是从已鉴定的肽序列中推断蛋白质。本文介绍一款可提升确定性蛋白质鉴定效果的新型机器学习算法Re-Fraction。Re-Fraction利用多种蛋白质物理性质,将蛋白质分配至构成大规模基于质谱蛋白质组学数据集的预期蛋白质组分中。随后依托该信息,可将肽段合理匹配至特定蛋白质。该方法灵敏度高、特异性强且计算效率优异。我们已提供当前版本Re-Fraction的算法与源代码,该工具可接收MaxQuant环境生成的输出表格。尽管如此,Re-Fraction的核心原理同样适用于其他从蛋白质水平分级的样本中生成数据的蛋白质鉴定流程。我们通过重新分析已发表研究的数据集验证了该方法的实用性,并生成了由Re-Fraction确定性鉴定得到的蛋白质列表——这类蛋白质此前仅被归类为某蛋白质群组的成员。研究发现,该方法在解析由剪接变体与同源蛋白构成的蛋白质群组时尤为实用;这类变体和蛋白通常以细胞或组织特异性方式表达,并可能产生重要的生物学效应。
创建时间:
2016-02-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作