Datasets for paper "Label-free data standardization for clinical metabolomics".
收藏Figshare2017-03-11 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Datasets_for_paper_Global_data_standardization_algorithm_for_applied_metabolomics_/3153982
下载链接
链接失效反馈官方服务:
资源简介:
<b>1. </b><b>Mass lists.</b><b>File name format: Sample_(sample number)_(mass spectrometer used).txt) </b>Archive: 'Mass lists.zip'Basic set:Sample_1_maXis.txt Sample_2_maXis.txt Sample_3_maXis.txt Sample_1_ Apex_Ultra.txt Sample_2_ Apex_Ultra.txt Sample_3_ Apex_Ultra.txt Sample_1_ OrbiTrap_Elite.txt Sample_2_ OrbiTrap_Elite.txt Sample_3_ OrbiTrap_Elite.txt Sample_1_ micrOTOF-Q.txt Sample_2_ micrOTOF-Q.txt Sample_3_ micrOTOF-Q.txt Sample_1_ IFunnel_Q-ToF.txt Sample_2_ IFunnel_Q-ToF.txt Sample_3_ IFunnel_Q-ToF.txt Additional set:Sample_3_maXis_Wide_range.txt Sample_3_maXis_High_range.txt <br><b>2. Raw mass spectra in mzXL format.</b>File name format: Sample_(sample number)_(mass spectrometer used).mzXLArchive: 'Mass spectra.zip'<br><b>3. Data for the basic set of mass spectra (file ‘matrix.mat’). </b>Data are presented in Matlab workspace format and include the following:<b>mz</b>,<b> </b><i>m/z</i> values for mass peak intensities;<b></b><b>intensities</b>,<b> </b>mass peak intensities, where columns correspond to mass lists (basic set) and rows correspond to <i>m/z</i> values;<b></b><b>normalization_curves</b>,<b> </b>normalization curves for mass lists, where columns correspond to mass lists (basic set) and rows correspond to <i>m/z</i> values;<b></b><b>standardized_intensities</b>,<b> </b>standardized_intensities for mass lists,<b> </b>where columns correspond to mass lists (basic set) and rows correspond to <i>m/z</i> values.<b></b><b> </b><b>4. Data for additional set of mass spectra (file ‘matrix_ad.mat’). </b>Data are presented in Matlab workspace format and include the following:<b>mz</b>,<b> </b><i>m/z</i> values for mass peak intensities;<b></b><b>intensities</b>,<b> </b>mass peak intensities, where columns correspond to mass lists (additional set) and rows correspond to <i>m/z</i> values;<b></b><b>normalization_curves</b>,<b> </b>normalization curves for mass lists, where columns correspond to mass lists (additional set) and rows correspond to <i>m/z</i> values;<b></b><b>standardized_intensities</b>,<b> </b>standardized_intensities for mass lists,<b> </b>where columns correspond to mass lists (additional set) and rows correspond to <i>m/z</i> values.<b></b> <b>5. Dataset for Figure 1 (file ‘Dataset for Figure 1.xlsx’).</b>The file is in the Microsoft Excel program format and includes data for Figure 1.<b> </b><b>6. Dataset for Figure 3 (file ‘Dataset data for Figure 3.xlsx’).</b>The file is in the Microsoft Excel program format and includes data for Figure 3.<b></b><b> </b><b>7.</b> <b>Source code for the SantaOmics algorithm and the data to run it.</b> The source code is presented as a Matlab script (file <b>‘SantaOmics.m’</b>). Data are presented as a saved Matlab workspace (file <b>‘workspace.mat’</b>). To run the SantaOmics algorithm, the workspace should be loaded in the Matlab program, and ‘SantaOmics.m’ should be evaluated in the Matlab environment. Mass peak intensities of the initial mass spectra (presented in variable ‘intensities’; n = 15) should be standardized and written as variable ‘standardized_intensities’. Depending on the power of the computer, the algorithm may take from several to tens of minutes to complete.
**1. 质量列表(Mass lists)**
文件名格式:`Sample_(样本编号)_(所用质谱仪).txt`,存档文件为「Mass lists.zip」。
基础数据集包含以下样本:Sample_1_maXis.txt、Sample_2_maXis.txt、Sample_3_maXis.txt、Sample_1_Apex_Ultra.txt、Sample_2_Apex_Ultra.txt、Sample_3_Apex_Ultra.txt、Sample_1_OrbiTrap_Elite.txt、Sample_2_OrbiTrap_Elite.txt、Sample_3_OrbiTrap_Elite.txt、Sample_1_micrOTOF-Q.txt、Sample_2_micrOTOF-Q.txt、Sample_3_micrOTOF-Q.txt、Sample_1_IFunnel_Q-ToF.txt、Sample_2_IFunnel_Q-ToF.txt、Sample_3_IFunnel_Q-ToF.txt。
附加数据集包含以下样本:Sample_3_maXis_Wide_range.txt、Sample_3_maXis_High_range.txt。
**2. mzXL格式原始质谱数据**
文件名格式:`Sample_(样本编号)_(所用质谱仪).mzXL`,存档文件为「Mass spectra.zip」。
**3. 基础质谱数据集相关数据(文件`matrix.mat`)**
数据以MATLAB工作区格式存储,包含以下字段:
- `mz`:质谱峰强度对应的质荷比(m/z)数值;
- `intensities`:质谱峰强度矩阵,其中列对应基础集的质量列表,行对应质荷比数值;
- `normalization_curves`:质量列表的归一化曲线矩阵,列对应基础集的质量列表,行对应质荷比数值;
- `standardized_intensities`:质量列表的标准化强度矩阵,列对应基础集的质量列表,行对应质荷比数值。
**4. 附加质谱数据集相关数据(文件`matrix_ad.mat`)**
数据以MATLAB工作区格式存储,包含以下字段:
- `mz`:质谱峰强度对应的质荷比(m/z)数值;
- `intensities`:质谱峰强度矩阵,其中列对应附加集的质量列表,行对应质荷比数值;
- `normalization_curves`:质量列表的归一化曲线矩阵,列对应附加集的质量列表,行对应质荷比数值;
- `standardized_intensities`:质量列表的标准化强度矩阵,列对应附加集的质量列表,行对应质荷比数值。
**5. 图1所用数据集(文件`Dataset for Figure 1.xlsx`)**
该文件为Microsoft Excel格式,包含图1的对应数据。
**6. 图3所用数据集(文件`Dataset data for Figure 3.xlsx`)**
该文件为Microsoft Excel格式,包含图3的对应数据。
**7. SantaOmics算法源代码及运行所需数据**
源代码以MATLAB脚本形式提供(文件`SantaOmics.m`)。运行所需数据以保存的MATLAB工作区形式提供(文件`workspace.mat`)。运行SantaOmics算法时,需先在MATLAB程序中加载该工作区,再在MATLAB环境中执行`SantaOmics.m`。初始质谱数据的质谱峰强度存储于变量`intensities`中(样本量*n*=15),需将其标准化后存储为变量`standardized_intensities`。根据计算机性能不同,算法运行耗时从数分钟至数十分钟不等。
创建时间:
2016-04-05



