~500k MS2 spectra with 70k SMILES, InChIKey, NPC & ClassyFire annotations
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20036408
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is a zstd-compressed MGF containing a harmonized subset of GNPS public library and MassSpecGym MS/MS spectra.
Each spectrum includes:
- RDKit tautomer-canonical `SMILES`- RDKit-derived `INCHIKEY`- `FORMULA` derived from the canonical SMILES using `smiles-parser`- NPClassifier.rs Faithful CUDA f32 predictions at pathway, superclass, and class levels- ClassyFire/ChemOnt labels, including hierarchy path and direct parent- Source provenance fields for GNPS or MassSpecGym
Filtering applied:
- Removed spectra without usable SMILES- Removed spectra with zero precursor m/z or zero charge- Kept spectra with at least 3 peaks- Retained at most the top 60 peaks per spectrum before SPLASH collision handling- Removed all spectra involved in duplicate or conflicting post-top60 SPLASH groups- Kept only spectra with all three NPC layers populated and usable ChemOnt labels
Final contents:
- Spectra: 522,678- Unique canonical SMILES: 48,689- Unique NPC labels: 595- Unique ChemOnt labels: 1,942
提供机构:
Zenodo
创建时间:
2026-05-05



