CANOPUS evaluation data
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/CANOPUS_evaluation_data/13073051
下载链接
链接失效反馈官方服务:
资源简介:
We evaluated CANOPUS on two MS/MS reference datasets: The SVM training dataset, which was also used for training CSI:FingerID (in 10-fold cross-validation), and the Agilent MassHunter library, used as indepenent dataset.
The SVM training dataset contains spectra from GNPS, MassBank, and NIST17. As NIST17 is a commercial library, we can only provide the spectra from GNPS and MassBank. Here, we provide the public part of the SVM training dataset (svm_training_data.zip).
For training the deep neural network we used a subset of PubChem with 1,106,938 structures for which we downloaded ClassyFire annotations (Feunang et al 2016) and another set of 2,997,933 compounds from the ClassyFire database. The PubChem structures, together with ClassyFire annotations for the evaluation data are available as "structures.csv.gz".
With CANOPUS, we analyzed data from two biological studies; the mzML and mzXML files are available at MassIVE (https://massive.ucsd.edu/) with the accession numbers MSV000079949 (mice data, Quinn et al 2020) and MSV000081082 (Euphorbia plant data, Ernst et al 2019).
The network visualization of the mice data was done using Cytoscape (Shannon et al 2003). Here, we provide the Cytoscape file (mice_multiple_classes.cys).
The source code of CANOPUS is part of the SIRIUS GitHub repository (https://github.com/boecker-lab/sirius-libs). The scripts we used for analyzing and visualizing the data are available at the GitHub repository (https://github.com/kaibioinfo/canopus_treemap).
See the LICENSE.txt for further licensing information on Classyfire annotations and mass spectra.
创建时间:
2020-10-15



