Py-GC/MS Dataset for the Classification of Bacterial Species (Version 1.0)
收藏DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19659864
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes eight bacterial species, which includes biological warfare agent (BWA) simulants and spans both Gram-positive and Gram-negative groups.
Bacillus atrophaeus (DSM 675),
Francisella philomiragia (DSM 7535),
Escherichia coli (DSM 500),
Streptococcus mitis (DSM 12643),
Yersinia enterocolitica (DSM 4780),
Acinetobacter baumannii (DSM 25645),
Agromyces mediolanus (DSM 40), and
Staphylococcus epidermidis (DSM 30905).
Samples were analyzed using pyrolysis gas chromatography mass spectrometry (Py-GC/MS), producing 2D GC×MS chromatograms. From these, 26 fatty acid methyl ester (FAME) features were extracted to capture key biochemical characteristics. The dataset covers three concentration levels, resulting in a total of 104 measurements.
Fatty Acid Methyl Esters (FAMEs) with Retention Time, Carbon Number, and Molecular Weight:
Index
RT (min)
FAME
Carbon Number
MW
1
8.00
Methyl undecanoate
C11:0
200
2
8.06
Methyl 2-hydroxydecanoate
C10:0 2-OH
202
3
8.89
Methyl dodecanoate
C12:0
214
4
9.62
Methyl tridecanoate
C13:0
228
5
9.73
Methyl 2-hydroxydodecanoate
C12:0 2-OH
230
6
10.05
Methyl 3-hydroxydodecanoate
C12:0 3-OH
230
7
10.45
Methyl tetradecanoate
C14:0
242
8
10.89
Methyl 13-methyltetradecanoate
C15:0 iso
256
9
10.97
Methyl 12-methyltetradecanoate
C15:0 anteiso
256
10
11.17
Methyl pentadecanoate
C15:0
256
11
11.18
Methyl 2-hydroxytetradecanoate
C14:0 2-OH
258
12
11.55
Methyl 3-hydroxytetradecanoate
C14:0 3-OH
258
13
11.55
Methyl 14-methylpentadecanoate
C16:0 iso
270
14
11.72
Methyl hexadecenoate (cis-9)
C16:1
268
15*
11.86
Methyl hexadecanoate
C16:0
270
16
12.26
Methyl 15-methylhexadecanoate
C17:0 iso
284
17
12.43
Methyl cis-9,10-methylenehexadecanoate
C17:0
282
18
12.52
Methyl heptadecanoate
C17:0
284
19
12.62
Methyl 2-hydroxyhexadecanoate
C16:0 2-OH
286
20
12.90
Methyl octadecadienoate (all cis-9,12)
C18:2
294
21
13.00
Methyl octadecenoate (cis-9)
C18:1
296
22
13.00
Methyl octadecenoate (trans-9)
C18:2
296
23*
13.15
Methyl octadecanoate
C18:0
298
24
13.64
Methyl cis-9,10-methyleneoctadecanoate
C19:0
310
25
13.70
Methyl nonadecanoate
C19:0
298
26
14.33
Methyl eicosanoate
C20:0
328
*Two FAMEs, Methyl hexadecanoate (C16:0) and Methyl octadecanoate (C18:0), are marked as starred. Their unusually high values, compared to the rest of the features, are attributed to environmental contamination rather than meaningful biological variation. Therefore, these features are not recommended for inclusion in the training process, as they may introduce bias and negatively affect model performance.
The dataset is organized in HDF5 format, with all samples stored under a hierarchical group named samples. Each sample is represented as an individual group (e.g., sample_00000, sample_00001, etc.), ensuring a consistent and scalable structure.
For every sample, the following data and metadata are included:
2D GCxMS chromatograms (intensity matrix)
m/z axis (mass-to-charge ratio)
retention time axis
FAME features
metadata (original filename, bacterial class, gram type, and concentration level)
The dataset includes the following global information:
Dataset name: Identifier of the dataset.
Description: Overview of the dataset content, including chromatograms, FAME features, bacterial classes, gram types, and concentration levels.
Version: Dataset version.
Source: Origin of the dataset.
DOI: Digital Object Identifier of the dataset.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Creators: List of dataset creators.
Contributors: List of dataset contributors.
Contact email: Email address for correspondence and inquiries.
Date: Dataset creation date.
FAME feature names: List of the 26 extracted FAME feature names.
提供机构:
Zenodo
创建时间:
2026-05-04



