five

HREI-MSDB: High-resolution electron ionization mass spectral database for diverse volatile compounds

收藏
Figshare2025-07-31 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/HREI-MSDB_High-resolution_electron_ionization_mass_spectral_database_for_diverse_volatile_compounds/29713460
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains a database of high-resolution electron ionization (EI) mass spectra recorded under gas chromatography - mass spectrometry (GC-MS) conditions. The vast majority of publicly available GC-MS data sets are obtained using low-resolution mass spectrometry. Few exceptions are the works E.J. Price, 2021, and V.Castro, 2022. At the same time, gas chromatography-high-resolution mass spectrometry (GC-HRMS) is used quite often in studies.This database aimed to create a GC-HRMS data set covering the diverse classes of volatile compounds (trimethylsilyl- derivatives are not included!), using a wide m/z range (starting from m/z = 40). Mass spectra were recorded using an Orbitrap Exploris GC mass detector (Thermo Fisher Scientific, USA). The mass determination error is no more than 0.0006 Da, and the mass spectral resolution value is 30000. All mass spectra were checked manually; the .zip archives contain information on peak annotations. The data.xlsx file contains a list of compounds and spectra IDs. Peaks with intensity less than 1/999 of the most intense were discarded.The data set includes:130 mass spectra of pure compounds recorded using GC-MS of 10-molecule batches or GC-MS of individual compound solutions.61 mass spectra of compounds included in the 8270 MegaMix standard compound mixture.45 mass spectra of volatile compounds included in lavender essential oil.38 mass spectra of volatile compounds included in mint essential oil.33 mass spectra of volatile compounds included in lemon essential oil.22 mass spectra of volatile compounds included in coffee.These groups of spectra are designated as Pure samples, 8270 MegaMix Standard, Lavender (essential oil), Mint (essential oil), Lemon (essential oil), and Coffee, respectively in the data.xlsx file and in the "Comments" tag in the MSP files. Please note which spectrum was obtained in what way. Identification of compounds in essential oils and coffee is quite reliable, but it was still performed without using standard samples.For convenience, in some cases (for essential oils), SMILES are provided using symbols denoting stereoisomers, but we cannot be sure that we really know which stereoisomer we are considering: often, both the retention indices and mass spectra are very close.Detailed information on the experimental conditions under which the spectra were obtained, on the equipment, and data processing is contained in the info.pdf file. The quality_assessment.xlsx file contains data obtained during quality control of the mass spectra (see the info.pdf file for additional information).Each file named all_spectra contains all spectra (both those obtained using the sample collection and those obtained from essential oil and coffee samples) in different file formats. Most likely, you need the all_spectra.msp file (NIST-compatible), it contains all the data. The plant_volatiles.msp file contains all mass spectra obtained from essential oils and coffee. The names of the remaining files are self-explanatory. If you need annotations of all peaks or more file formats, then look at the .zip archives. JCAMP (.jdx) files are in the .zip archives.Processing (interpretation) of mass spectra was done using our software:https://github.com/mtshn/gchrmsexplain versions 0.0.2 and 0.0.3.The settings used are given in the info.pdf file; however, these settings are the default for the corresponding versions.Levels of explanation of each peak in the mass spectrum:Level 1 - the molecular formula is selected, but some isotopic peaks are not found at allLevel 2 - isotopic peaks merge with other peaks. For example, the 13C peak of some ion X is superimposed (taking into account the resolution) on the main peak X + H. At not very high resolutions, such peaks may not be resolved. This also includes cases of "incorrect" isotopic peak intensity, differing from the theoretically calculated one.Level 3 - all main isotopic peaks are observed correctly, up to the accuracy of mass determination.The minimum number of bonds that must be broken to obtain such a fragment is indicated without taking into account the loss of hydrogens, as well as without some other "trivial" bond breaks: the loss of a halogen atom, a methyl group, NO-loss from a nitro-group. Details are given in the documentation of the software used to process the mass spectra: https://github.com/mtshn/gchrmsexplain.In files containing abbreviated interpretations of mass spectra (e.g., in CSV_annotated folders in .zip archives), notations like 3-1 are used. The first number denotes the interpretation level (see above), and the second denotes the number of (non-trivial) bond breaks required to obtain such a molecular formula.
创建时间:
2025-07-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作