five

Py-GC/MS Dataset for the Classification of Bacterial Species (Version 1.0)

收藏
DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19659864
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset includes eight bacterial species, which includes biological warfare agent (BWA) simulants and spans both Gram-positive and Gram-negative groups. Bacillus atrophaeus (DSM 675), Francisella philomiragia (DSM 7535), Escherichia coli (DSM 500), Streptococcus mitis (DSM 12643), Yersinia enterocolitica (DSM 4780), Acinetobacter baumannii (DSM 25645), Agromyces mediolanus (DSM 40), and Staphylococcus epidermidis (DSM 30905). Samples were analyzed using pyrolysis gas chromatography mass spectrometry (Py-GC/MS), producing 2D GC×MS chromatograms. From these, 26 fatty acid methyl ester (FAME) features were extracted to capture key biochemical characteristics. The dataset covers three concentration levels, resulting in a total of 104 measurements. Fatty Acid Methyl Esters (FAMEs) with Retention Time, Carbon Number, and Molecular Weight: Index RT (min) FAME Carbon Number MW 1 8.00 Methyl undecanoate C11:0 200 2 8.06 Methyl 2-hydroxydecanoate C10:0 2-OH 202 3 8.89 Methyl dodecanoate C12:0 214 4 9.62 Methyl tridecanoate C13:0 228 5 9.73 Methyl 2-hydroxydodecanoate C12:0 2-OH 230 6 10.05 Methyl 3-hydroxydodecanoate C12:0 3-OH 230 7 10.45 Methyl tetradecanoate C14:0 242 8 10.89 Methyl 13-methyltetradecanoate C15:0 iso 256 9 10.97 Methyl 12-methyltetradecanoate C15:0 anteiso 256 10 11.17 Methyl pentadecanoate C15:0 256 11 11.18 Methyl 2-hydroxytetradecanoate C14:0 2-OH 258 12 11.55 Methyl 3-hydroxytetradecanoate C14:0 3-OH 258 13 11.55 Methyl 14-methylpentadecanoate C16:0 iso 270 14 11.72 Methyl hexadecenoate (cis-9) C16:1 268 15* 11.86 Methyl hexadecanoate C16:0 270 16 12.26 Methyl 15-methylhexadecanoate C17:0 iso 284 17 12.43 Methyl cis-9,10-methylenehexadecanoate C17:0 282 18 12.52 Methyl heptadecanoate C17:0 284 19 12.62 Methyl 2-hydroxyhexadecanoate C16:0 2-OH 286 20 12.90 Methyl octadecadienoate (all cis-9,12) C18:2 294 21 13.00 Methyl octadecenoate (cis-9) C18:1 296 22 13.00 Methyl octadecenoate (trans-9) C18:2 296 23* 13.15 Methyl octadecanoate C18:0 298 24 13.64 Methyl cis-9,10-methyleneoctadecanoate C19:0 310 25 13.70 Methyl nonadecanoate C19:0 298 26 14.33 Methyl eicosanoate C20:0 328 *Two FAMEs, Methyl hexadecanoate (C16:0) and Methyl octadecanoate (C18:0), are marked as starred. Their unusually high values, compared to the rest of the features, are attributed to environmental contamination rather than meaningful biological variation. Therefore, these features are not recommended for inclusion in the training process, as they may introduce bias and negatively affect model performance. The dataset is organized in HDF5 format, with all samples stored under a hierarchical group named samples. Each sample is represented as an individual group (e.g., sample_00000, sample_00001, etc.), ensuring a consistent and scalable structure.  For every sample, the following data and metadata are included: 2D GCxMS chromatograms (intensity matrix) m/z axis (mass-to-charge ratio) retention time axis FAME features metadata (original filename, bacterial class, gram type, and concentration level) The dataset includes the following global information: Dataset name: Identifier of the dataset. Description: Overview of the dataset content, including chromatograms, FAME features, bacterial classes, gram types, and concentration levels. Version: Dataset version. Source: Origin of the dataset. DOI: Digital Object Identifier of the dataset. License: Creative Commons Attribution 4.0 International (CC BY 4.0) Creators: List of dataset creators. Contributors: List of dataset contributors. Contact email: Email address for correspondence and inquiries. Date: Dataset creation date. FAME feature names: List of the 26 extracted FAME feature names.
提供机构:
Zenodo
创建时间:
2026-05-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作