CANOPUS evaluation data
收藏DataCite Commons2025-06-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/CANOPUS_evaluation_data/13073051/1
下载链接
链接失效反馈官方服务:
资源简介:
We evaluated CANOPUS on two MS/MS reference datasets: The SVM training dataset, which was also used for training CSI:FingerID (in 10-fold cross-validation), and the Agilent MassHunter library, used as indepenent dataset.<br>The SVM training dataset contains spectra from GNPS, MassBank, and NIST17. As NIST17 is a commercial library, we can only provide the spectra from GNPS and MassBank. Here, we provide the public part of the SVM training dataset (svm_training_data.zip).<br><br>For training the deep neural network we used a subset of PubChem with 1,106,938 structures for which we downloaded ClassyFire annotations (Feunang et al 2016) and another set of 2,997,933 compounds from the ClassyFire database. The PubChem structures, together with ClassyFire annotations for the evaluation data are available as "structures.csv.gz".<br>With CANOPUS, we analyzed data from two biological studies; the mzML and mzXML files are available at MassIVE (https://massive.ucsd.edu/) with the accession numbers MSV000079949 (mice data, Quinn et al 2020) and MSV000081082 (Euphorbia plant data, Ernst et al 2019).The network visualization of the mice data was done using Cytoscape (Shannon et al 2003). Here, we provide the Cytoscape file (mice_multiple_classes.cys).The source code of CANOPUS is part of the SIRIUS GitHub repository (https://github.com/boecker-lab/sirius-libs). The scripts we used for analyzing and visualizing the data are available at the GitHub repository (https://github.com/kaibioinfo/canopus_treemap).<br><br>See the LICENSE.txt for further licensing information on Classyfire annotations and mass spectra.
本研究在两套MS/MS(串联质谱,tandem mass spectrometry)参考数据集上对CANOPUS开展评测:其一为曾用于训练CSI:FingerID的支持向量机(SVM, Support Vector Machine)训练集(采用十折交叉验证方案),其二为用作独立测试集的安捷伦MassHunter数据库。
该SVM训练集包含来自GNPS、MassBank及NIST17的质谱谱图。由于NIST17为商业数据库,我们仅能公开GNPS与MassBank的谱图数据。本文提供该SVM训练集的公开子集(svm_training_data.zip)。
为训练深度神经网络,我们使用了包含1,106,938个分子结构的PubChem子集,并为其下载了ClassyFire注释信息(Feunang等,2016);此外还从ClassyFire数据库中获取了2,997,933个化合物数据集。可通过"structures.csv.gz"获取该PubChem结构数据及用于评测的ClassyFire注释信息。
我们使用CANOPUS分析了两项生物学研究的数据,相关mzML与mzXML格式文件可在MassIVE数据库(https://massive.ucsd.edu/)获取,对应的登录号分别为MSV000079949(小鼠数据集,Quinn等,2020)与MSV000081082(大戟属植物数据集,Ernst等,2019)。小鼠数据集的网络可视化采用Cytoscape完成(Shannon等,2003),本文提供对应的Cytoscape文件(mice_multiple_classes.cys)。CANOPUS的源代码已集成至SIRIUS的GitHub仓库(https://github.com/boecker-lab/sirius-libs),本研究用于数据分析与可视化的脚本可在GitHub仓库(https://github.com/kaibioinfo/canopus_treemap)获取。
有关ClassyFire注释信息与质谱谱图的更多授权条款,请参阅LICENSE.txt文件。
提供机构:
figshare
创建时间:
2020-10-15
搜集汇总
数据集介绍

背景与挑战
背景概述
CANOPUS评估数据集包含用于评估CANOPUS性能的MS/MS参考数据,包括公开的SVM训练数据集和Agilent MassHunter库。数据集还提供了用于训练深度神经网络的PubChem结构和ClassyFire注释,以及来自两个生物研究的质谱数据文件。
以上内容由遇见数据集搜集并总结生成



