five

Exposomics Spectral Library

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3755854
下载链接
链接失效反馈
官方服务:
资源简介:
Title:  Repurposing Public Metabolomics Datasets for Construction of an Exposomics Spectral Library Introduction Publicly archived metabolomics datasets from diverse human biosamples provides an opportunity to repurpose the shared datasets for further exploratory analysis into human health. Though, most of the times the endogenous metabolome is implicated in disease research as biomarkers, the growing role of exposome in human health underscores the need for identification of chemical exposures in human samples. In this regard, I explored the possibility of finding previously unreported exposomal compounds (i.e., anthropogenic, industrial, dietary, and microbial chemicals) from the true unknowns in these studied datasets. Using in silico spectral library matching followed by molecular structure prediction approaches, the aim of this study is to recognize the exposome, and minimize the gap between potential number of true exposomic substances in biosamples. Methods Raw metabolomics (GC-MS) datasets were downloaded from Metabolomics Workbench, GNPS, and MetaboLights using key words- ‘human, GC-MS, serum, plasma, muscle, liver, kidney’. The vendor formatted mass spectrometry datasets were converted to .mzML formats using MSConvertGUI (ProteoWizad) for data processing and spectral library (GOLM, MoNA, Fiehnlib, MassBank) matching using MS-DIAL. For EI-MS spectral annotation, the identity was confirmed by the presence of [M−CH3]+, [M+H]+, [M+C2H5]+ and [M+C3H5]+ and using Global Natural Products Social (GNPS) molecular networking. Exposomal metabolites were separated from the rest based on identifiers at the Blood Exposome DB. True unassigned spectra were further interrogated using MS-FINDER for structural prediction. Exported spectra in .msp and .txt formats were pooled into a single file for free public download and use. Preliminary data The pooled GC-MS datasets (50) obtained from the three repositories were from multiple human samples, multiple vendors, and were generated using multiple mass analyzers (single and triple quads, ToFs, and Orbitraps). The .mzML files were processed for data preprocessing such as deconvolution, peak picking, and peak alignment followed by compound identification using MS-DIAL and GNPS tools. Processing parameters for the datasets were optimized individually in a study-specific manner. Altogether, the data resulted in spectral assignment of approx. 400 compounds of endogenous origin, associated with a KEGG and HMDB identifier relating to generic metabolic pathways, using only open source spectral libraries. Given extremely limited overlap between spectral libraries, I used a pooled spectral library generated from all available open source spectral data. Further, 350 unassigned spectra (displaying insufficient matching scores for an assignment, i.e., < 500; with S/N >25 in each dataset) were interrogated using MS-FINDER and Global Natural Products Social (GNPS) molecular networking approach (both cosine score, > 0.5; balance score, > 0.9) that resulted in annotation of 250 exposomic compounds. Using ClassyFire the exposomal compounds (InChIs) were assigned a hierarchical chemical classification which indicated diverse origin of these compounds ranging from medications, industrial chemicals, pollutants to phytochemicals of dietary origin. The assigned spectra were individually manually curated and then compiled as a single file available as the ‘Exposomics Spectral Library’ to public as .txt and .msp file formats for free use and is available: 10.5281/zenodo.3755855.
创建时间:
2020-04-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作