five

food-ai-nexus/salmonella-serovar-hyperspectral-spectra

收藏
Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/food-ai-nexus/salmonella-serovar-hyperspectral-spectra
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 language: - en pretty_name: Salmonella Serovar Hyperspectral Spectra (Foods 2025) tags: - food-safety - salmonella - hyperspectral - tabular-classification - spectroscopy task_categories: - tabular-classification configs: - config_name: default data_files: - split: train path: data/train.parquet - split: test path: data/test.parquet --- # Salmonella Serovar Hyperspectral Spectra (Foods 2025) Salmonella Serovar Hyperspectral Spectra is a tabular dataset of single-cell spectral features for foodborne bacterial classification. It was created to support research in rapid pathogen identification, enabling models to classify *Salmonella* serovars using hyperspectral signatures extracted from individual bacterial cells. > **Companion image dataset:** The RGB composite microscopy images from which these spectra were extracted are available at [`food-ai-nexus/salmonella-serovar-hyperspectral`](https://huggingface.co/datasets/food-ai-nexus/salmonella-serovar-hyperspectral). This dataset accompanies the publication: Papa, M., Bhattacharya, S., Park, B., & Yi, J. (2025). Rapid *Salmonella* Serovar Classification Using AI-Enabled Hyperspectral Microscopy with Enhanced Data Preprocessing and Multimodal Fusion. *Foods*, 14(15), 2737. doi: [10.3390/foods14152737](https://doi.org/10.3390/foods14152737) ## Dataset Description Each row represents one bacterial cell segmented from a hyperspectral data cube (hypercube). Spectra are Standard Normal Variate (SNV)-normalized mean single-cell spectra across 303 wavebands (399–1000 nm, 2 nm bandwidth), extracted using an attention-gated residual U-Net (ARG2U-Net). ```python from datasets import load_dataset ds = load_dataset("food-ai-nexus/salmonella-serovar-hyperspectral-spectra") # ds['train'] → 18,180 rows | ds['test'] → 7,792 rows ``` ## Splits The 70/30 train/test split is performed at the row level, stratified by `Serovar` (seed=42), mirroring the paper's reported methodology. | Split | Rows | Notes | | :--- | ---: | :--- | | `train` | 18,180 | 70% stratified by serovar | | `test` | 7,792 | 30% stratified by serovar | ## Schema | Column | Type | Description | | :--- | :--- | :--- | | `InImage_ID` | int | Per-serovar cell index identifying the source hypercube | | `Band_2_W_401.00` … `Band_303_W_1000.90` | float | SNV-normalized mean spectral reflectance at each waveband (nm) | | `Serovar` | string | Target label: one of `Enteritidis`, `I4`, `Infantis`, `Johannesburg`, `Kentucky` | > **Important note on `InImage_ID`:** This index identifies the source hypercube **within each serovar group**, not globally. It cannot be used as a direct foreign key to join rows to specific files in the companion image dataset. ## Classes Five *Salmonella* serovars selected based on their prevalence in foodborne illness outbreaks: | Label | Serovar | Train Rows | Test Rows | | :--- | :--- | ---: | ---: | | `Enteritidis` | *S.* Enteritidis | 3,731 | 1,600 | | `I4` | *S.* 4,[5],12:i:- | 3,638 | 1,559 | | `Infantis` | *S.* Infantis | 5,201 | 2,229 | | `Johannesburg` | *S.* Johannesburg | 3,265 | 1,399 | | `Kentucky` | *S.* Kentucky | 2,345 | 1,005 | > **Note on class imbalance:** The spectra are inherently imbalanced because different serovars yield different numbers of segmentable cells per hypercube. This reflects biological variation in cell density and morphology, not a sampling artifact. > **Known data issue:** One cell record (`InImage_ID=73`, Enteritidis) has NaN values for bands 148–303 (wavelengths 692–1001 nm) in the original Zenodo source CSV. This row is preserved as-is to maintain source fidelity. Users should apply appropriate imputation or filtering before training. ## Baseline Performance | Model | Modality | Test Accuracy | | :--- | :--- | ---: | | PCA-MLP | Spectral only | 81.1% | | PCA-MLP + EfficientNetV2 | Multimodal fusion (Image + Spectra) | **82.4%** | ## License This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. ## Citation ```bibtex @article{papa2025salmonella, title = {Rapid Salmonella Serovar Classification Using AI-Enabled Hyperspectral Microscopy with Enhanced Data Preprocessing and Multimodal Fusion}, author = {Papa, MeiLi and Bhattacharya, Siddhartha and Park, Bosoon and Yi, Jiyoon}, journal = {Foods}, volume = {14}, number = {15}, pages = {2737}, year = {2025}, doi = {10.3390/foods14152737} } ``` ## Source Original dataset: [Zenodo 10.5281/zenodo.16740800](https://zenodo.org/records/16740800) Code repository: [GitHub food-ai-engineering-lab/salmonella-serovar-classification-foods](https://github.com/food-ai-engineering-lab/salmonella-serovar-classification-foods)
提供机构:
food-ai-nexus
二维码
社区交流群
二维码
科研交流群
商业服务