food-ai-nexus/chicken-salmonella-campylobacter-us-facilities
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/food-ai-nexus/chicken-salmonella-campylobacter-us-facilities
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- tabular-classification
tags:
- food-safety
- agriculture
- microbiology
language:
- en
size_categories:
- 1K<n<10K
pretty_name: Salmonella and Campylobacter in Raw Chicken Carcass (US)
---
**Salmonella and Campylobacter in Raw Chicken Carcass (US)** is a tabular dataset containing meteorological and temporal data for raw chicken carcass samples tested for the presence of *Salmonella* and *Campylobacter* across the United States.
With this dataset, researchers can train machine learning models to predict the presence of *Salmonella* and *Campylobacter* in raw chicken carcasses based on environmental, temporal, and geographical predictors.
# Content
* The dataset contains 4,887 raw chicken carcass samples collected from USDA FSIS-regulated establishments across 36 US states.
* It spans 71 columns covering establishment identifiers, geographic coordinates, temporal features, and meteorological data (temperature, precipitation, humidity, wind, pressure, etc.) for the day of collection and the three days prior.
* The dataset includes two main target variables for binary classification: `SalmonellaSPAnalysis` and `CampylobacterAnalysis30ml`.
* This dataset was sourced from "Dataset: Raw Poultry (Current)" at the USDA FSIS website. It includes only the samples categorized as "Animal-Chicken-Broiler / Young Chicken Carcass Rinse," as at the time of data acquisition in October 2023, testing results for other categories of raw poultry product were not merged.
# Data Fields
The dataset contains 71 columns.
**Identifier**
| Column | Description |
| --- | --- |
| `EstablishmentID` | A unique numeric identifier for the FSIS-regulated establishment. This is an optional grouping variable useful for facility-level stratification (e.g., train/test splits by establishment); it is not a predictive feature. |
**Target Variables**
| Column | Description |
| --- | --- |
| `SalmonellaSPAnalysis` | The result of the analysis for *Salmonella* species in the sample (Positive / Negative). |
| `CampylobacterAnalysis30ml` | The result of the analysis for *Campylobacter* species in the sample tested using the enrichment method (Positive / Negative). |
**Sample Information**
| Column | Description |
| --- | --- |
| `State` | The US state where the establishment is located (36 unique states). |
| `Longitude` | Longitude of the establishment. |
| `Latitude` | Latitude of the establishment. |
| `Numerical_Weekday` | Day of the week when the sample was collected, encoded numerically (1 = Monday through 7 = Sunday). |
**Meteorological Features (Day 0 to Day 3)**
| Column | Description |
| --- | --- |
| `AverageTemp_Day0` to `AverageTemp_Day3` | The average temperature on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `CloudCover_Day0` to `CloudCover_Day3` | The cloud cover on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `DEW_Day0` to `DEW_Day3` | The dew point on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `Humidity_Day0` to `Humidity_Day3` | The humidity on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `MaxTemp_Day0` to `MaxTemp_Day3` | The maximum temperature on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `MinTemp_Day0` to `MinTemp_Day3` | The minimum temperature on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `PrecipType_Day0` to `PrecipType_Day3` | The precipitation type on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `Precipitation_Day0` to `Precipitation_Day3` | The precipitation amount on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `Pressure_Day0` to `Pressure_Day3` | The air pressure on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `Snow_Day0` to `Snow_Day3` | The snow level on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `SnowDepth_Day0` to `SnowDepth_Day3` | The snow depth on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `Solar Radiation_Day0` to `Solar Radiation_Day3` | The solar radiation on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `Visibility_Day0` to `Visibility_Day3` | The visibility on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `WindDirection_Day0` to `WindDirection_Day3` | The wind direction on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `WindGust_Day0` to `WindGust_Day3` | The wind gust on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
| `WindSpeed_Day0` to `WindSpeed_Day3` | The wind speed on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). |
# Uses
The dataset can be used to train machine learning models (e.g., Neural Networks, Decision Trees, Gradient Boosting, KNN, SVM, Logistic Regression) to predict the presence of *Salmonella* and *Campylobacter* in raw chicken carcasses. It can also be used in research areas such as food safety, agricultural microbiology, and meteorological impact studies on foodborne pathogens.
Use the **"Use this dataset"** button at the top of the page to load the dataset into your preferred library. To prepare the data for binary classification, use the `SalmonellaSPAnalysis` or `CampylobacterAnalysis30ml` columns as target labels. The `EstablishmentID` column can be used to perform establishment-stratified train/test splits but should be excluded from the feature matrix.
```python
import pandas as pd
from datasets import load_dataset
ds = load_dataset("food-ai-nexus/chicken-salmonella-campylobacter-us-facilities")
df = ds["train"].to_pandas()
```
# License
This dataset is licensed under the MIT License. It is intended for research and educational use.
# Reference
This dataset was sourced from "Dataset: Raw Poultry (Current)" at the USDA FSIS website:
https://www.fsis.usda.gov/news-events/publications/raw-poultry-sampling
---
license: MIT许可证
task_categories:
- 表格分类(tabular-classification)
tags:
- 食品安全(food-safety)
- 农业(agriculture)
- 微生物学(microbiology)
language:
- 英语(en)
size_categories:
- 1000<n<10000
pretty_name: 美国生鸡肉胴体中的沙门氏菌与弯曲杆菌(Salmonella and Campylobacter in Raw Chicken Carcass (US))
---
**美国生鸡肉胴体中的沙门氏菌与弯曲杆菌(Salmonella and Campylobacter in Raw Chicken Carcass (US))** 是一款表格分类(tabular-classification)数据集,收录了美国境内针对生鸡肉胴体样本检测沙门氏菌(Salmonella)与弯曲杆菌(Campylobacter)时采集的气象与时序数据。
研究人员可借助该数据集训练机器学习模型,基于环境、时序与地理预测因子,预测生鸡肉胴体中沙门氏菌与弯曲杆菌的存在情况。
# 数据集内容
* 本数据集涵盖4887份生鸡肉胴体样本,采集自美国36个州的美国农业部食品安全检验局(USDA FSIS)监管的屠宰场。
* 数据集包含71列数据,涵盖机构标识符、地理坐标、时序特征,以及采样当日及前3日的气象数据(气温、降水量、湿度、风速、气压等)。
* 数据集包含两个用于二元分类的核心目标变量:`SalmonellaSPAnalysis`与`CampylobacterAnalysis30ml`。
* 本数据集源自美国农业部食品安全检验局(USDA FSIS)官网的"Dataset: Raw Poultry (Current)"数据集。在2023年10月数据采集时,仅纳入了归类为"Animal-Chicken-Broiler / Young Chicken Carcass Rinse"的样本,其他生鲜禽类产品类别的检测结果尚未合并。
# 数据字段
本数据集共包含71列数据。
## 标识符
| 列名 | 描述 |
| --- | --- |
| `EstablishmentID` | 美国农业部食品安全检验局(USDA FSIS)监管机构的唯一数字标识符。该字段为可选分组变量,可用于机构层面的分层划分(例如按机构拆分训练集与测试集),但不可作为预测特征。 |
## 目标变量
| 列名 | 描述 |
| --- | --- |
| `SalmonellaSPAnalysis` | 样本中沙门氏菌(Salmonella)的检测结果(阳性/阴性)。 |
| `CampylobacterAnalysis30ml` | 采用富集法检测的样本中弯曲杆菌(Campylobacter)的检测结果(阳性/阴性)。 |
## 样本信息
| 列名 | 描述 |
| --- | --- |
| `State` | 机构所在的美国州份(共36个独特州)。 |
| `Longitude` | 机构的经度坐标。 |
| `Latitude` | 机构的纬度坐标。 |
| `Numerical_Weekday` | 样本采集当日为一周中的第几天,以数字编码(1代表周一,7代表周日)。 |
## 气象特征(采样当日至前3日,Day 0 至 Day 3)
| 列名 | 描述 |
| --- | --- |
| `AverageTemp_Day0` 至 `AverageTemp_Day3` | 采样前第X日的平均气温,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `CloudCover_Day0` 至 `CloudCover_Day3` | 采样前第X日的云量,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `DEW_Day0` 至 `DEW_Day3` | 采样前第X日的露点温度,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `Humidity_Day0` 至 `Humidity_Day3` | 采样前第X日的相对湿度,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `MaxTemp_Day0` 至 `MaxTemp_Day3` | 采样前第X日的最高气温,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `MinTemp_Day0` 至 `MinTemp_Day3` | 采样前第X日的最低气温,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `PrecipType_Day0` 至 `PrecipType_Day3` | 采样前第X日的降水类型,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `Precipitation_Day0` 至 `Precipitation_Day3` | 采样前第X日的降水量,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `Pressure_Day0` 至 `Pressure_Day3` | 采样前第X日的大气气压,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `Snow_Day0` 至 `Snow_Day3` | 采样前第X日的降雪量,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `SnowDepth_Day0` 至 `SnowDepth_Day3` | 采样前第X日的积雪深度,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `Solar Radiation_Day0` 至 `Solar Radiation_Day3` | 采样前第X日的太阳辐射量,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `Visibility_Day0` 至 `Visibility_Day3` | 采样前第X日的能见度,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `WindDirection_Day0` 至 `WindDirection_Day3` | 采样前第X日的风向,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `WindGust_Day0` 至 `WindGust_Day3` | 采样前第X日的阵风风速,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
| `WindSpeed_Day0` 至 `WindSpeed_Day3` | 采样前第X日的平均风速,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 |
# 使用场景
本数据集可用于训练机器学习模型(例如神经网络、决策树、梯度提升树、KNN、SVM、逻辑回归),以预测生鸡肉胴体中沙门氏菌与弯曲杆菌的存在情况。此外,还可应用于食品安全、农业微生物学、食源性病原体气象影响研究等研究领域。
请使用页面顶部的"使用此数据集(Use this dataset)"按钮,将数据集加载至你偏好的开发库中。若需将数据用于二元分类任务,请以`SalmonellaSPAnalysis`或`CampylobacterAnalysis30ml`列作为目标标签。`EstablishmentID`列可用于实现机构分层的训练集/测试集划分,但不应纳入特征矩阵。
python
import pandas as pd
from datasets import load_dataset
ds = load_dataset("food-ai-nexus/chicken-salmonella-campylobacter-us-facilities")
df = ds["train"].to_pandas()
# 授权协议
本数据集采用MIT许可证(MIT License)授权,仅可用于研究与教育用途。
# 参考文献
本数据集源自美国农业部食品安全检验局(USDA FSIS)官网的"Dataset: Raw Poultry (Current)"数据集,网址为:
https://www.fsis.usda.gov/news-events/publications/raw-poultry-sampling
提供机构:
food-ai-nexus



