FlexMS Benchmark Datasets: Mass Spectrum Prediction for Metabolomics
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/FlexMS_Benchmark_Datasets_Mass_Spectrum_Prediction_for_Metabolomics/31144573
下载链接
链接失效反馈官方服务:
资源简介:
This dataset accompanies the paper "FlexMS: A Flexible Framework for Benchmarking Deep Learning-based Mass Spectrum Prediction Tools in Metabolomics."
OverviewFlexMS is a flexible framework for constructing and evaluating deep learning models for tandem mass spectrum (MS/MS) prediction. This repository contains the preprocessed benchmark datasets used in the paper.
Dataset ContentsThe archive contains four benchmark datasets in FlexMS-compatible CSV format:
1. GNPS (Global Natural Products Social Molecular Networking)Size: ~322,000 spectra from ~16,000 moleculesSplits: Random and ScaffoldFiles: train.csv, valid.csv, test.csv2. MassBankSize: ~62,000 spectra from ~4,000 moleculesSplit: ScaffoldFiles: train.csv, valid.csv, test.csv3. MassSpecGymSize: ~231,000 spectra from ~29,000 moleculesSplit: Predefined by original datasetFiles: MassSpecGym_train.csv, MassSpecGym_valid.csv, MassSpecGym_test.csv4. MIST-CANOPUS (NPLIB1)Size: ~8,000 spectra from ~7,000 moleculesSplits: 3-fold cross-validationFiles: split_0/, split_1/, split_2/ (each with train, val, test)Data FormatEach CSV file contains the following columns:
spec_id: Spectrum identifiermol_id: Molecule identifiersmiles: Molecular SMILES stringinchikey_s: InChIKeyformula: Molecular formulamw: Molecular weightprec_type: Precursor ion type (e.g., [M+H]+)inst_type: Instrument type (e.g., QTOF, Orbitrap)ion_mode: Ionization mode (positive/negative)ace/nce: Absolute/Normalized collision energyprec_mz: Precursor m/zpeaks: List of (m/z, intensity) pairsUsageExtract the archive and use with the FlexMS framework:
tar -xzvf FlexMS_data.tar.gz
创建时间:
2026-01-24



