Sketch Stochastic Dictionary Learning Ecoli_DIA dataset
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Sketch_Stochastic_Dictionary_Learning_Ecoli_DIA_dataset/13589621
下载链接
链接失效反馈官方服务:
资源简介:
This folder contains the validation dataset used in the folowing paper, submitted at Statistical Analysis and Data Mining journal:Olga Permiakova, Thomas Burger. Sketched Stochastic Dictionary Learning for large-scale data and application to large-scale mass spectrometry data. January 2021. The dataset consists of proteomic data resulting from a liquid chromatography and mass spectrometry analysis of Escherichia Coli (E. coli) sample. The experimental details of the proteomics analysis can be found in [1].For the experiments reported in the Statistical Analysis and Data Mining submission, only a subset of the data has been used. More precisely, those acquired from 10 to 30 minutes (out of two hours). These data have been preprocessed and formatted as a matrix, as described in the supplemental material of [1]. The matrix columns represent the chromatographic profiles acquired along the sample elution. The index of the matrix rows correspond to the discrete elution time stamps. The resulting data matrix contains 256 rows and 74,193 columns. The data matrix is stored as an object of the Filebacked Big Matrix (FBM) class of bigstatsr R package (https://github.com/privefl/bigstatsr). The associated binary files are Ecoli_DIA.bk of 151.9 Mb and Ecoli_DIA.rds of 33.8 KB. To access the dataset in R, both files must be in the same folder. [1] Olga Permiakova, Romain Guibert, Alexandra Kraut, Thomas Fortin, Anne-Marie Hesse, Thomas Burger. CHICKN: Extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis. Accepted for a publication in BMC Bioinformatics, January 2021.
创建时间:
2021-01-16



