MassIVE-KB v1 30 million PSMs training/validation/test splits
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14967860
下载链接
链接失效反馈官方服务:
资源简介:
The MassIVE-KB data are derived from PSMs used to compile the MassIVE-KB v1 spectral library and consists of approximately 30 million PSMs. The PSMs were obtained by collecting up to the top 100 PSMs for each of the 2,154,269 precursors (as defined by a peptidoform and charge) included in the MassIVE-KB v1 spectral library.
The data are split into peptide-disjoint training, validation, and test sets, consisting of:
Training: 28,508,636 PSMs for 1,496,701 unique peptidoforms.
Validation: 1,000,234 PSMs for 52,379 unique peptidoforms.
Test: 996,027 PSMs for 52,399 unique peptidoforms.
The dataset was originally compiled through the following steps:
On the MassIVE website, go to MassIVE Knowledge Base > Human HCD Spectral Library > All Candidate library spectra > Download.
This will give you a zipped TSV file with the metadata and peptide identifications for all 30 million PSMs.
Using the filename (column "filename") you can then retrieve the corresponding peak files from the MassIVE FTP server (done using a wget script) and extract the desired spectra using their scan number (column "scan").
创建时间:
2025-03-10



