Increasing the Scale of the Mass Spectrometry Query Language Compendium with Explainable AI
收藏Figshare2025-08-25 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Increasing_the_Scale_of_the_Mass_Spectrometry_Query_Language_Compendium_with_Explainable_AI/29978743
下载链接
链接失效反馈官方服务:
资源简介:
A significant bottleneck in metabolomics data interpretation is the effective use of domain knowledge to assign structural information based on fragmentation patterns. The mass spectrometry query language (MassQL) aims to make this process accessible and applicable across multiple analysis platforms. While advanced computational methods are capable of predicting compound structures from fragmentation data, AI/ML approaches often rely on complex, opaque criteria that are difficult to interpret or modify. As a result, their predictive patterns cannot be readily translated into human-readable rules, such as those used in MassQL. In this study, we introduce ChemEcho, a machine learning embedding method that converts tandem mass spectrometry data into sparse feature vectors containing peak and neutral mass subformulae to enhance explainable AI/ML-based methods. An advantage of this approach is that decision trees trained using these feature vectors can be directly translated to MassQL. Using a battery of decision trees trained using ChemEcho embeddings to predict molecular attributes, we generated over 1500 MassQL queries for 765 molecular features and evaluated their precision and recall. From these queries, the 50 highest-performing queries were integrated into the MassQL compendium. This set of generated MassQL queries included environmentally and biologically relevant classes such as PFAS and molecules containing phosphate or sulfate substructures. To illustrate the impact these queries would have on a typical metabolomics experiment, these MassQL queries were applied to a public metabolomics data setresulting in a marked increase in the structural information derived from tandem mass spectra. Access and reuse of these queries is expected to enhance structural annotation in untargeted experiments, leading to more specific claims and advancing many applications in metabolomics.
创建时间:
2025-08-25



