Public Data files for MassFormer
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7762057
下载链接
链接失效反馈官方服务:
资源简介:
Public data files for experiments in MassFormer. See the Github repository for instructions on how to use this data.
Raw Data:
casmi_2016.tgz - Critical Assessment of Small Molecule Identification 2016, used for model evaluation.
casmi_2022.tgz - Critical Assessment of Small Molecule Identification 2022, used for model evaluation.
mb_na_msms.msp.gz - MassBank of North America export of LC-MS/MS spectra, used for model evaluation.
cid_smiles.tsv.gz - Mapping of CID to SMILES strings, obtained from PubChem.
Processed Data:
proc_casmi_2016.tgz - Processed spectrum and molecule data for the CASMI 2016 benchmark.
proc_casmi_2022.tgz - Processed spectrum and molecule data for the CASMI 2022 benchmark.
proc_nist20_outlier.tgz - Processed spectrum and molecule data for the NIST20 Outlier benchmark (formerly called pseudo-CASMI).
proc_demo.tgz - Processed spectrum and molecule data for the demo (refer to code repository for more information).
cfm.tgz - Predicted spectra for the Competitive Fragmentation Modelling (CFM) baseline.
Model Checkpoints:
demo.pkl - Checkpoint of a MassFormer model trained on MoNA data, for the purposes of running the demo.
checkpoint_best_pcqm4mv2.pt - Checkpoint of a Graphormer model pretrained on the PCQM4M dataset, used for initialization of some MassFormer models. Copied from this url. Please refer to the Graphormer repository for more information.
创建时间:
2023-10-03



