five

~500k MS2 spectra with 70k SMILES, InChIKey, NPC & ClassyFire annotations

收藏
DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20036408
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is a zstd-compressed MGF containing a harmonized subset of GNPS public library and MassSpecGym MS/MS spectra. Each spectrum includes: - RDKit tautomer-canonical `SMILES`- RDKit-derived `INCHIKEY`- `FORMULA` derived from the canonical SMILES using `smiles-parser`- NPClassifier.rs Faithful CUDA f32 predictions at pathway, superclass, and class levels- ClassyFire/ChemOnt labels, including hierarchy path and direct parent- Source provenance fields for GNPS or MassSpecGym Filtering applied: - Removed spectra without usable SMILES- Removed spectra with zero precursor m/z or zero charge- Kept spectra with at least 3 peaks- Retained at most the top 60 peaks per spectrum before SPLASH collision handling- Removed all spectra involved in duplicate or conflicting post-top60 SPLASH groups- Kept only spectra with all three NPC layers populated and usable ChemOnt labels Final contents: - Spectra: 522,678- Unique canonical SMILES: 48,689- Unique NPC labels: 595- Unique ChemOnt labels: 1,942
提供机构:
Zenodo
创建时间:
2026-05-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作