compMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC–MS Data Sets

NIAID Data Ecosystem2026-03-10 收录

下载链接：

https://figshare.com/articles/dataset/compMS2Miner_An_Automatable_Metabolite_Identification_Visualization_and_Data-Sharing_R_Package_for_High-Resolution_LC_MS_Data_Sets/4789384

下载链接

链接失效反馈

官方服务：

资源简介：

A long-standing challenge of untargeted metabolomic profiling by ultrahigh-performance liquid chromatography–high-resolution mass spectrometry (UHPLC–HRMS) is efficient transition from unknown mass spectral features to confident metabolite annotations. The compMS2Miner (Comprehensive MS2 Miner) package was developed in the R language to facilitate rapid, comprehensive feature annotation using a peak-picker-output and MS2 data files as inputs. The number of MS2 spectra that can be collected during a metabolomic profiling experiment far outweigh the amount of time required for pain-staking manual interpretation; therefore, a degree of software workflow autonomy is required for broad-scale metabolite annotation. CompMS2Miner integrates many useful tools in a single workflow for metabolite annotation and also provides a means to overview the MS2 data with a Web application GUI compMS2Explorer (Comprehensive MS2 Explorer) that also facilitates data-sharing and transparency. The automatable compMS2Miner workflow consists of the following steps: (i) matching unknown MS1 features to precursor MS2 scans, (ii) filtration of spectral noise (dynamic noise filter), (iii) generation of composite mass spectra by multiple similar spectrum signal summation and redundant/contaminant spectra removal, (iv) interpretation of possible fragment ion substructure using an internal database, (v) annotation of unknowns with chemical and spectral databases with prediction of mammalian biotransformation metabolites, wrapper functions for in silico fragmentation software, nearest neighbor chemical similarity scoring, random forest based retention time prediction, text-mining based false positive removal/true positive ranking, chemical taxonomic prediction and differential evolution based global annotation score optimization, and (vi) network graph visualizations, data curation, and sharing are made possible via the compMS2Explorer application. Metabolite identities and comments can also be recorded using an interactive table within compMS2Explorer. The utility of the package is illustrated with a data set of blood serum samples from 7 diet induced obese (DIO) and 7 nonobese (NO) C57BL/6J mice, which were also treated with an antibiotic (streptomycin) to knockdown the gut microbiota. The results of fully autonomous and objective usage of compMS2Miner are presented here. All automatically annotated spectra output by the workflow are provided in the Supporting Information and can alternatively be explored as publically available compMS2Explorer applications for both positive and negative modes (https://wmbedmands.shinyapps.io/compMS2_mouseSera_POS and https://wmbedmands.shinyapps.io/compMS2_mouseSera_NEG). The workflow provided rapid annotation of a diversity of endogenous and gut microbially derived metabolites affected by both diet and antibiotic treatment, which conformed to previously published reports. Composite spectra (n = 173) were autonomously matched to entries of the Massbank of North America (MoNA) spectral repository. These experimental and virtual (lipidBlast) spectra corresponded to 29 common endogenous compound classes (e.g., 51 lysophosphatidylcholines spectra) and were then used to calculate the ranking capability of 7 individual scoring metrics. It was found that an average of the 7 individual scoring metrics provided the most effective weighted average ranking ability of 3 for the MoNA matched spectra in spite of potential risk of false positive annotations emerging from automation. Minor structural differences such as relative carbon–carbon double bond positions were found in several cases to affect the correct rank of the MoNA annotated metabolite. The latest release and an example workflow is available in the package vignette (https://github.com/WMBEdmands/compMS2Miner) and a version of the published application is available on the shinyapps.io site (https://wmbedmands.shinyapps.io/compMS2Example).

超高效液相色谱-高分辨质谱（ultrahigh-performance liquid chromatography–high-resolution mass spectrometry, UHPLC–HRMS）用于非靶向代谢组学分析的长期挑战之一，是如何高效地将未知质谱特征转化为可靠的代谢物注释。compMS2Miner（Comprehensive MS2 Miner）工具包基于R语言开发，旨在以峰拾取器输出文件与MS2数据文件为输入，实现快速、全面的特征注释。代谢组学分析中可采集的MS2光谱数量远多于耗时耗力的人工解析所需时间，因此大规模代谢物注释需要一定程度的软件流程自动化。CompMS2Miner将多种实用工具整合至单个工作流中用于代谢物注释，同时通过Web应用GUI compMS2Explorer（Comprehensive MS2 Explorer）提供MS2数据概览功能，还可支持数据共享与流程透明化。可自动化运行的compMS2Miner工作流包含以下步骤：(i) 将未知MS1特征匹配至前驱体MS2扫描；(ii) 光谱噪声过滤（动态噪声过滤器）；(iii) 通过多相似谱信号求和及冗余/污染谱去除生成复合质谱；(iv) 利用内部数据库解析可能的碎片离子子结构；(v) 结合化学与光谱数据库对未知物进行注释，同时涵盖哺乳动物生物转化代谢物预测、计算机模拟碎裂软件包装函数、近邻化学相似性评分、基于随机森林（random forest）的保留时间预测、基于文本挖掘的假阳性去除/真阳性排序、化学分类学预测以及基于差分进化（differential evolution）的全局注释评分优化；(vi) 通过compMS2Explorer应用可实现网络图可视化、数据整理与共享。用户还可通过compMS2Explorer内的交互式表格记录代谢物鉴定结果与备注信息。该工具包的效用通过一组血清样本数据集得以验证：该数据集包含7只饮食诱导肥胖（diet induced obese, DIO）与7只非肥胖（nonobese, NO）C57BL/6J小鼠的血清样本，这些小鼠均经链霉素（streptomycin）处理以敲除肠道菌群。本文展示了完全自动化且客观使用compMS2Miner的分析结果。本工作流输出的所有自动注释质谱均附于支持信息中，也可通过公开可用的compMS2Explorer应用分别探索正、负离子模式数据（https://wmbedmands.shinyapps.io/compMS2_mouseSera_POS 与 https://wmbedmands.shinyapps.io/compMS2_mouseSera_NEG）。该工作流可快速注释受饮食与抗生素处理共同影响的多种内源性及肠道菌群衍生代谢物，这一结果与既往发表的研究相符。研究中共自主将173张复合质谱匹配至北美质谱库（Massbank of North America, MoNA）的收录条目。这些实验光谱与lipidBlast虚拟光谱对应29种常见内源性化合物类别（例如51张溶血磷脂酰胆碱（lysophosphatidylcholines）光谱），随后被用于计算7种独立评分指标的排序能力。研究发现，尽管自动化注释存在出现假阳性注释的潜在风险，但7种独立评分指标的平均值可为MoNA匹配质谱提供最有效的加权平均排序能力，其最优排名为3。在部分案例中发现，诸如碳-碳双键相对位置这类细微结构差异会影响MoNA注释代谢物的正确排名。该工具包的最新版本与示例工作流可在其R包文档示例（vignette）页面（https://github.com/WMBEdmands/compMS2Miner）中获取，已发表应用的一个版本也可在shinyapps.io站点（https://wmbedmands.shinyapps.io/compMS2Example）上获取。

创建时间：

2017-03-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集