Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics

NIAID Data Ecosystem2026-03-10 收录

下载链接：

https://figshare.com/articles/dataset/Unsupervised_Discovery_and_Comparison_of_Structural_Families_Across_Multiple_Samples_in_Untargeted_Metabolomics/5176480

下载链接

链接失效反馈

官方服务：

资源简介：

In untargeted metabolomics approaches, the inability to structurally annotate relevant features and map them to biochemical pathways is hampering the full exploitation of many metabolomics experiments. Furthermore, variable metabolic content across samples result in sparse feature matrices that are statistically hard to handle. Here, we introduce MS2LDA+ that tackles both above-mentioned problems. Previously, we presented MS2LDA, which extracts biochemically relevant molecular substructures (“Mass2Motifs”) from a collection of fragmentation spectra as sets of co-occurring molecular fragments and neutral losses, thereby recognizing building blocks of metabolomics. Here, we extend MS2LDA to handle multiple metabolomics experiments in one analysis, resulting in MS2LDA+. By linking Mass2Motifs across samples, we expose the variability in prevalence of structurally related metabolite families. We validate the differential prevalence of substructures between two distinct samples groups and apply it to fecal samples. Subsequently, within one sample group of urines, we rank the Mass2Motifs based on their variance to assess whether xenobiotic-derived substructures are among the most-variant Mass2Motifs. Indeed, we could ascribe 22 out of the 30 most-variant Mass2Motifs to xenobiotic-derived substructures including paracetamol/acetaminophen mercapturate and dimethylpyrogallol. In total, we structurally characterized 101 Mass2Motifs with biochemically or chemically relevant substructures. Finally, we combined the discovered metabolite families with full scan feature intensity information to obtain insight into core metabolites present in most samples and rare metabolites present in small subsets now linked through their common substructures. We conclude that by biochemical grouping of metabolites across samples MS2LDA+ aids in structural annotation of metabolites and guides prioritization of analysis by using Mass2Motif prevalence.

在非靶向代谢组学（untargeted metabolomics）研究中，无法对相关质谱特征进行结构注释并将其映射至生化通路，正严重阻碍众多代谢组学实验的充分挖掘与应用。此外，不同样本间代谢物含量的异质性会生成稀疏特征矩阵，给统计分析带来极大挑战。为此，我们提出MS2LDA+以解决上述两类核心问题。此前我们曾报道MS2LDA方法，该方法可从一系列碎裂质谱中提取具备生化意义的分子亚结构——质量基序（Mass2Motif），将其定义为共现分子碎片与中性丢失的集合，从而识别代谢组学的基本构成单元。本研究将MS2LDA扩展至可在单次分析中处理多组代谢组学实验，由此得到MS2LDA+。通过跨样本关联质量基序，我们能够揭示结构相关代谢物家族的出现频率差异。我们验证了两类不同样本组间亚结构出现频率的差异，并将该方法应用于粪便样本。随后，在一组尿液样本中，我们基于质量基序的变异程度对其进行排序，以评估外源性物质（xenobiotic）衍生的亚结构是否位列变异程度最高的质量基序之列。实验结果证实，30个变异程度最高的质量基序中有22个可归因于外源性物质衍生的亚结构，包括对乙酰氨基酚/扑热息痛巯基尿酸盐与二甲基连苯三酚。总体而言，我们共对101个质量基序完成了结构表征，明确了其生化或化学相关的亚结构属性。最终，我们将发现的代谢物家族与全扫描特征强度信息相结合，得以深入解析绝大多数样本中存在的核心代谢物，以及仅在少量样本中出现的稀有代谢物——这些物质如今可通过共同的亚结构实现关联。我们得出结论：通过跨样本的代谢物生化分组，MS2LDA+可辅助代谢物的结构注释，并借助质量基序的出现频率指导分析优先级的确定。

创建时间：

2017-07-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集