five

Data used in "Fast metabolite identification with Input Output Kernel Regression"

收藏
Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://zenodo.org/record/804241
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains the data used in [1] to evaluate the performance for metabolite identification from tandem mass spectra. These data have been extracted and processed in [2]. We used a subset of 4138 MS/MS spectra extracted from the GNPS public spectral library (https://gnps.ucsd.edu/ProteoSAFe/libraries.jsp) for training and evaluation. For searching, we used molecular structures from PubChem as candidate sets. Please mention and cite GNPS when using these data. The implementation of the method proposed in [1] is available on: https://version.aalto.fi/gitlab/kepaco/Fast-metabolite-identification-with-IOKR Files description: spectra.txt: informations about the MS/MS spectra (GNPS identifier, compound name and INCHI identifier) data_GNPS.mat: contains the molecular fingerprints, molecular formula and InCHI corresponding to the MS/MS spectra cv_ind.txt: indices of the cross-validation folds ind_eval.txt: indices of the examples used for evaluation candidates: fingerprints and INCHI for the different candidate sets input_kernels: contains 24 input kernel matrices References: [1] Brouard, C., Shen, H., Dührkop, K., d'Alché-Buc, F., Böcker, S. and Rousu, J.: Fast metabolite identification with Input Output Kernel Regression. In the proceedings of ISMB 2016, Bioinformatics 32(12): i28-i36, 2016. DOI: https://doi.org/10.1093/bioinformatics/btw246 [2] Dührkop, K., Shen, H., Meusel, M., Rousu, J. and Böcker, S.: Searching molecular structure databases with tandem mass spectra using CSI:FingerID. PNAS, 112(41), 12580-12585, 2015. doi:10.1073/pnas.1509788112

本仓库包含文献[1]中用于评估串联质谱(tandem mass spectra, MS/MS)代谢物识别性能的实验数据。上述数据已在文献[2]中完成提取与预处理工作。我们从GNPS公共光谱库(https://gnps.ucsd.edu/ProteoSAFe/libraries.jsp)中提取了包含4138条MS/MS质谱谱图的子集,用于模型的训练与评估。在检索环节,我们采用PubChem数据库中的分子结构作为候选集合。使用本数据集时,请务必标注并引用GNPS相关文献。 文献[1]中提出的方法的实现代码可于以下地址获取:https://version.aalto.fi/gitlab/kepaco/Fast-metabolite-identification-with-IOKR 文件说明如下: - spectra.txt:存储MS/MS质谱谱图的相关信息,包括GNPS标识符、化合物名称及INCHI标识符(International Chemical Identifier) - data_GNPS.mat:存储对应MS/MS质谱谱图的分子指纹(molecular fingerprints)、分子式及INCHI标识符 - cv_ind.txt:交叉验证(cross-validation)折的索引文件 - ind_eval.txt:用于模型评估的样本索引文件 - candidates:存储不同候选集的分子指纹及INCHI标识符 - input_kernels:包含24个输入核矩阵(kernel matrices) 参考文献: [1] Brouard, C., Shen, H., Dührkop, K., d'Alché-Buc, F., Böcker, S. 与 Rousu, J.:基于输入输出核回归(Input Output Kernel Regression, IOKR)的快速代谢物识别方法。收录于2016年国际智能分子生物学系统会议(ISMB 2016)论文集,发表于《Bioinformatics》期刊32卷12期,页码i28-i36,2016年。DOI:https://doi.org/10.1093/bioinformatics/btw246 [2] Dührkop, K., Shen, H., Meusel, M., Rousu, J. 与 Böcker, S.:使用CSI:FingerID通过串联质谱谱图检索分子结构数据库。发表于《美国国家科学院院刊》(Proceedings of the National Academy of Sciences, PNAS)112卷41期,页码12580-12585,2015年。DOI:https://doi.org/10.1073/pnas.1509788112
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作