Data used in "Fast metabolite identification with Input Output Kernel Regression"
收藏Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://zenodo.org/record/804241
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the data used in [1] to evaluate the performance for metabolite identification from tandem mass spectra. These data have been extracted and processed in [2]. We used a subset of 4138 MS/MS spectra extracted from the GNPS public spectral library (https://gnps.ucsd.edu/ProteoSAFe/libraries.jsp) for training and evaluation. For searching, we used molecular structures from PubChem as candidate sets. Please mention and cite GNPS when using these data. The implementation of the method proposed in [1] is available on: https://version.aalto.fi/gitlab/kepaco/Fast-metabolite-identification-with-IOKR Files description: spectra.txt: informations about the MS/MS spectra (GNPS identifier, compound name and INCHI identifier) data_GNPS.mat: contains the molecular fingerprints, molecular formula and InCHI corresponding to the MS/MS spectra cv_ind.txt: indices of the cross-validation folds ind_eval.txt: indices of the examples used for evaluation candidates: fingerprints and INCHI for the different candidate sets input_kernels: contains 24 input kernel matrices References: [1] Brouard, C., Shen, H., Dührkop, K., d'Alché-Buc, F., Böcker, S. and Rousu, J.: Fast metabolite identification with Input Output Kernel Regression. In the proceedings of ISMB 2016, Bioinformatics 32(12): i28-i36, 2016. DOI: https://doi.org/10.1093/bioinformatics/btw246 [2] Dührkop, K., Shen, H., Meusel, M., Rousu, J. and Böcker, S.: Searching molecular structure databases with tandem mass spectra using CSI:FingerID. PNAS, 112(41), 12580-12585, 2015. doi:10.1073/pnas.1509788112
本仓库包含文献[1]中用于评估串联质谱(tandem mass spectra, MS/MS)代谢物识别性能的实验数据。上述数据已在文献[2]中完成提取与预处理工作。我们从GNPS公共光谱库(https://gnps.ucsd.edu/ProteoSAFe/libraries.jsp)中提取了包含4138条MS/MS质谱谱图的子集,用于模型的训练与评估。在检索环节,我们采用PubChem数据库中的分子结构作为候选集合。使用本数据集时,请务必标注并引用GNPS相关文献。
文献[1]中提出的方法的实现代码可于以下地址获取:https://version.aalto.fi/gitlab/kepaco/Fast-metabolite-identification-with-IOKR
文件说明如下:
- spectra.txt:存储MS/MS质谱谱图的相关信息,包括GNPS标识符、化合物名称及INCHI标识符(International Chemical Identifier)
- data_GNPS.mat:存储对应MS/MS质谱谱图的分子指纹(molecular fingerprints)、分子式及INCHI标识符
- cv_ind.txt:交叉验证(cross-validation)折的索引文件
- ind_eval.txt:用于模型评估的样本索引文件
- candidates:存储不同候选集的分子指纹及INCHI标识符
- input_kernels:包含24个输入核矩阵(kernel matrices)
参考文献:
[1] Brouard, C., Shen, H., Dührkop, K., d'Alché-Buc, F., Böcker, S. 与 Rousu, J.:基于输入输出核回归(Input Output Kernel Regression, IOKR)的快速代谢物识别方法。收录于2016年国际智能分子生物学系统会议(ISMB 2016)论文集,发表于《Bioinformatics》期刊32卷12期,页码i28-i36,2016年。DOI:https://doi.org/10.1093/bioinformatics/btw246
[2] Dührkop, K., Shen, H., Meusel, M., Rousu, J. 与 Böcker, S.:使用CSI:FingerID通过串联质谱谱图检索分子结构数据库。发表于《美国国家科学院院刊》(Proceedings of the National Academy of Sciences, PNAS)112卷41期,页码12580-12585,2015年。DOI:https://doi.org/10.1073/pnas.1509788112
创建时间:
2023-06-28



