five

FlexMS Benchmark Datasets: Mass Spectrum Prediction for Metabolomics

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/FlexMS_Benchmark_Datasets_Mass_Spectrum_Prediction_for_Metabolomics/31144573
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset accompanies the paper "FlexMS: A Flexible Framework for Benchmarking Deep Learning-based Mass Spectrum Prediction Tools in Metabolomics." OverviewFlexMS is a flexible framework for constructing and evaluating deep learning models for tandem mass spectrum (MS/MS) prediction. This repository contains the preprocessed benchmark datasets used in the paper. Dataset ContentsThe archive contains four benchmark datasets in FlexMS-compatible CSV format: 1. GNPS (Global Natural Products Social Molecular Networking)Size: ~322,000 spectra from ~16,000 moleculesSplits: Random and ScaffoldFiles: train.csv, valid.csv, test.csv2. MassBankSize: ~62,000 spectra from ~4,000 moleculesSplit: ScaffoldFiles: train.csv, valid.csv, test.csv3. MassSpecGymSize: ~231,000 spectra from ~29,000 moleculesSplit: Predefined by original datasetFiles: MassSpecGym_train.csv, MassSpecGym_valid.csv, MassSpecGym_test.csv4. MIST-CANOPUS (NPLIB1)Size: ~8,000 spectra from ~7,000 moleculesSplits: 3-fold cross-validationFiles: split_0/, split_1/, split_2/ (each with train, val, test)Data FormatEach CSV file contains the following columns: spec_id: Spectrum identifiermol_id: Molecule identifiersmiles: Molecular SMILES stringinchikey_s: InChIKeyformula: Molecular formulamw: Molecular weightprec_type: Precursor ion type (e.g., [M+H]+)inst_type: Instrument type (e.g., QTOF, Orbitrap)ion_mode: Ionization mode (positive/negative)ace/nce: Absolute/Normalized collision energyprec_mz: Precursor m/zpeaks: List of (m/z, intensity) pairsUsageExtract the archive and use with the FlexMS framework: tar -xzvf FlexMS_data.tar.gz
创建时间:
2026-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作