five

GNPS - GLEAMS clustering dark proteome

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://www.omicsdi.org/dataset/gnps/MSV000088598
下载链接
链接失效反馈
官方服务:
资源简介:
GLEAMS is a deep neural network to embed spectra into a low-dimensional space in which spectra generated by the same peptide are close to one another. We have used GLEAMS as the basis for a large-scale spectrum clustering, detecting groups of unidentified, proximal spectra representing the same peptide. GLEAMS was used to embed 669 million spectra from the MassIVE-KB dataset, after which hierarchical clustering with average linkage was used to cluster the embeddings. Medoid spectra were extracted from clusters consisting of only unidentified spectra, resulting in 45 million medoid spectra representing 257 million clustered spectra. The medoid spectra were split into two groups based on cluster size (size two and size greater than two) and exported to two MGF files. ANN-SoLo was used for open modification searching, identifying 5.3 million peptide-spectrum matches. We here present the originally unidentified cluster medoid spectra and the ANN-SoLo identification results as a community resource. This is a valuable dataset to further explore the dark proteome, by investigating spectra that are observed repeatedly across many experiments but consistently remain unidentified.
创建时间:
2021-12-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作