Increasing the Scale of the Mass Spectrometry Query Language Compendium with Explainable AI
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Increasing_the_Scale_of_the_Mass_Spectrometry_Query_Language_Compendium_with_Explainable_AI/29978743
下载链接
链接失效反馈官方服务:
资源简介:
A significant bottleneck in metabolomics data interpretation
is
the effective use of domain knowledge to assign structural information
based on fragmentation patterns. The mass spectrometry query language
(MassQL) aims to make this process accessible and applicable across
multiple analysis platforms. While advanced computational methods
are capable of predicting compound structures from fragmentation data,
AI/ML approaches often rely on complex, opaque criteria that are difficult
to interpret or modify. As a result, their predictive patterns cannot
be readily translated into human-readable rules, such as those used
in MassQL. In this study, we introduce ChemEcho, a machine learning
embedding method that converts tandem mass spectrometry data into
sparse feature vectors containing peak and neutral mass subformulae
to enhance explainable AI/ML-based methods. An advantage of this approach
is that decision trees trained using these feature vectors can be
directly translated to MassQL. Using a battery of decision trees trained
using ChemEcho embeddings to predict molecular attributes, we generated
over 1500 MassQL queries for 765 molecular features and evaluated
their precision and recall. From these queries, the 50 highest-performing
queries were integrated into the MassQL compendium. This set of generated
MassQL queries included environmentally and biologically relevant
classes such as PFAS and molecules containing phosphate or sulfate
substructures. To illustrate the impact these queries would have on
a typical metabolomics experiment, these MassQL queries were applied
to a public metabolomics data setresulting in a marked increase
in the structural information derived from tandem mass spectra. Access
and reuse of these queries is expected to enhance structural annotation
in untargeted experiments, leading to more specific claims and advancing
many applications in metabolomics.
创建时间:
2025-08-25



