Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://figshare.com/articles/dataset/Machine_Learning_on_Large-Scale_Proteomics_Data_Identifies_Tissue_and_Cell-Type_Specific_Proteins/22335786
下载链接
链接失效反馈官方服务:
资源简介:
Using data from 183 public human data sets from PRIDE,
a machine
learning model was trained to identify tissue and cell-type specific
protein patterns. PRIDE projects were searched with ionbot and tissue/cell
type annotation was manually added. Data from physiological samples
were used to train a Random Forest model on protein abundances to
classify samples into tissues and cell types. Subsequently, a one-vs-all
classification and feature importance were used to analyze the most
discriminating protein abundances per class. Based on protein abundance
alone, the model was able to predict tissues with 98% accuracy, and
cell types with 99% accuracy. The F-scores describe a clear view on
tissue-specific proteins and tissue-specific protein expression patterns.
In-depth feature analysis shows slight confusion between physiologically
similar tissues, demonstrating the capacity of the algorithm to detect
biologically relevant patterns. These results can in turn inform downstream
uses, from identification of the tissue of origin of proteins in complex
samples such as liquid biopsies, to studying the proteome of tissue-like
samples such as organoids and cell lines.
创建时间:
2023-03-24



