five

food-ai-nexus/chicken-salmonella-campylobacter-us-facilities

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/food-ai-nexus/chicken-salmonella-campylobacter-us-facilities
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - tabular-classification tags: - food-safety - agriculture - microbiology language: - en size_categories: - 1K<n<10K pretty_name: Salmonella and Campylobacter in Raw Chicken Carcass (US) --- **Salmonella and Campylobacter in Raw Chicken Carcass (US)** is a tabular dataset containing meteorological and temporal data for raw chicken carcass samples tested for the presence of *Salmonella* and *Campylobacter* across the United States. With this dataset, researchers can train machine learning models to predict the presence of *Salmonella* and *Campylobacter* in raw chicken carcasses based on environmental, temporal, and geographical predictors. # Content * The dataset contains 4,887 raw chicken carcass samples collected from USDA FSIS-regulated establishments across 36 US states. * It spans 71 columns covering establishment identifiers, geographic coordinates, temporal features, and meteorological data (temperature, precipitation, humidity, wind, pressure, etc.) for the day of collection and the three days prior. * The dataset includes two main target variables for binary classification: `SalmonellaSPAnalysis` and `CampylobacterAnalysis30ml`. * This dataset was sourced from "Dataset: Raw Poultry (Current)" at the USDA FSIS website. It includes only the samples categorized as "Animal-Chicken-Broiler / Young Chicken Carcass Rinse," as at the time of data acquisition in October 2023, testing results for other categories of raw poultry product were not merged. # Data Fields The dataset contains 71 columns. **Identifier** | Column | Description | | --- | --- | | `EstablishmentID` | A unique numeric identifier for the FSIS-regulated establishment. This is an optional grouping variable useful for facility-level stratification (e.g., train/test splits by establishment); it is not a predictive feature. | **Target Variables** | Column | Description | | --- | --- | | `SalmonellaSPAnalysis` | The result of the analysis for *Salmonella* species in the sample (Positive / Negative). | | `CampylobacterAnalysis30ml` | The result of the analysis for *Campylobacter* species in the sample tested using the enrichment method (Positive / Negative). | **Sample Information** | Column | Description | | --- | --- | | `State` | The US state where the establishment is located (36 unique states). | | `Longitude` | Longitude of the establishment. | | `Latitude` | Latitude of the establishment. | | `Numerical_Weekday` | Day of the week when the sample was collected, encoded numerically (1 = Monday through 7 = Sunday). | **Meteorological Features (Day 0 to Day 3)** | Column | Description | | --- | --- | | `AverageTemp_Day0` to `AverageTemp_Day3` | The average temperature on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `CloudCover_Day0` to `CloudCover_Day3` | The cloud cover on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `DEW_Day0` to `DEW_Day3` | The dew point on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `Humidity_Day0` to `Humidity_Day3` | The humidity on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `MaxTemp_Day0` to `MaxTemp_Day3` | The maximum temperature on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `MinTemp_Day0` to `MinTemp_Day3` | The minimum temperature on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `PrecipType_Day0` to `PrecipType_Day3` | The precipitation type on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `Precipitation_Day0` to `Precipitation_Day3` | The precipitation amount on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `Pressure_Day0` to `Pressure_Day3` | The air pressure on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `Snow_Day0` to `Snow_Day3` | The snow level on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `SnowDepth_Day0` to `SnowDepth_Day3` | The snow depth on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `Solar Radiation_Day0` to `Solar Radiation_Day3` | The solar radiation on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `Visibility_Day0` to `Visibility_Day3` | The visibility on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `WindDirection_Day0` to `WindDirection_Day3` | The wind direction on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `WindGust_Day0` to `WindGust_Day3` | The wind gust on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | | `WindSpeed_Day0` to `WindSpeed_Day3` | The wind speed on day X before sampling; obtained from visualcrossing.com (for the day of collection and 1–3 days prior). | # Uses The dataset can be used to train machine learning models (e.g., Neural Networks, Decision Trees, Gradient Boosting, KNN, SVM, Logistic Regression) to predict the presence of *Salmonella* and *Campylobacter* in raw chicken carcasses. It can also be used in research areas such as food safety, agricultural microbiology, and meteorological impact studies on foodborne pathogens. Use the **"Use this dataset"** button at the top of the page to load the dataset into your preferred library. To prepare the data for binary classification, use the `SalmonellaSPAnalysis` or `CampylobacterAnalysis30ml` columns as target labels. The `EstablishmentID` column can be used to perform establishment-stratified train/test splits but should be excluded from the feature matrix. ```python import pandas as pd from datasets import load_dataset ds = load_dataset("food-ai-nexus/chicken-salmonella-campylobacter-us-facilities") df = ds["train"].to_pandas() ``` # License This dataset is licensed under the MIT License. It is intended for research and educational use. # Reference This dataset was sourced from "Dataset: Raw Poultry (Current)" at the USDA FSIS website: https://www.fsis.usda.gov/news-events/publications/raw-poultry-sampling

--- license: MIT许可证 task_categories: - 表格分类(tabular-classification) tags: - 食品安全(food-safety) - 农业(agriculture) - 微生物学(microbiology) language: - 英语(en) size_categories: - 1000<n<10000 pretty_name: 美国生鸡肉胴体中的沙门氏菌与弯曲杆菌(Salmonella and Campylobacter in Raw Chicken Carcass (US)) --- **美国生鸡肉胴体中的沙门氏菌与弯曲杆菌(Salmonella and Campylobacter in Raw Chicken Carcass (US))** 是一款表格分类(tabular-classification)数据集,收录了美国境内针对生鸡肉胴体样本检测沙门氏菌(Salmonella)与弯曲杆菌(Campylobacter)时采集的气象与时序数据。 研究人员可借助该数据集训练机器学习模型,基于环境、时序与地理预测因子,预测生鸡肉胴体中沙门氏菌与弯曲杆菌的存在情况。 # 数据集内容 * 本数据集涵盖4887份生鸡肉胴体样本,采集自美国36个州的美国农业部食品安全检验局(USDA FSIS)监管的屠宰场。 * 数据集包含71列数据,涵盖机构标识符、地理坐标、时序特征,以及采样当日及前3日的气象数据(气温、降水量、湿度、风速、气压等)。 * 数据集包含两个用于二元分类的核心目标变量:`SalmonellaSPAnalysis`与`CampylobacterAnalysis30ml`。 * 本数据集源自美国农业部食品安全检验局(USDA FSIS)官网的"Dataset: Raw Poultry (Current)"数据集。在2023年10月数据采集时,仅纳入了归类为"Animal-Chicken-Broiler / Young Chicken Carcass Rinse"的样本,其他生鲜禽类产品类别的检测结果尚未合并。 # 数据字段 本数据集共包含71列数据。 ## 标识符 | 列名 | 描述 | | --- | --- | | `EstablishmentID` | 美国农业部食品安全检验局(USDA FSIS)监管机构的唯一数字标识符。该字段为可选分组变量,可用于机构层面的分层划分(例如按机构拆分训练集与测试集),但不可作为预测特征。 | ## 目标变量 | 列名 | 描述 | | --- | --- | | `SalmonellaSPAnalysis` | 样本中沙门氏菌(Salmonella)的检测结果(阳性/阴性)。 | | `CampylobacterAnalysis30ml` | 采用富集法检测的样本中弯曲杆菌(Campylobacter)的检测结果(阳性/阴性)。 | ## 样本信息 | 列名 | 描述 | | --- | --- | | `State` | 机构所在的美国州份(共36个独特州)。 | | `Longitude` | 机构的经度坐标。 | | `Latitude` | 机构的纬度坐标。 | | `Numerical_Weekday` | 样本采集当日为一周中的第几天,以数字编码(1代表周一,7代表周日)。 | ## 气象特征(采样当日至前3日,Day 0 至 Day 3) | 列名 | 描述 | | --- | --- | | `AverageTemp_Day0` 至 `AverageTemp_Day3` | 采样前第X日的平均气温,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `CloudCover_Day0` 至 `CloudCover_Day3` | 采样前第X日的云量,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `DEW_Day0` 至 `DEW_Day3` | 采样前第X日的露点温度,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `Humidity_Day0` 至 `Humidity_Day3` | 采样前第X日的相对湿度,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `MaxTemp_Day0` 至 `MaxTemp_Day3` | 采样前第X日的最高气温,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `MinTemp_Day0` 至 `MinTemp_Day3` | 采样前第X日的最低气温,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `PrecipType_Day0` 至 `PrecipType_Day3` | 采样前第X日的降水类型,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `Precipitation_Day0` 至 `Precipitation_Day3` | 采样前第X日的降水量,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `Pressure_Day0` 至 `Pressure_Day3` | 采样前第X日的大气气压,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `Snow_Day0` 至 `Snow_Day3` | 采样前第X日的降雪量,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `SnowDepth_Day0` 至 `SnowDepth_Day3` | 采样前第X日的积雪深度,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `Solar Radiation_Day0` 至 `Solar Radiation_Day3` | 采样前第X日的太阳辐射量,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `Visibility_Day0` 至 `Visibility_Day3` | 采样前第X日的能见度,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `WindDirection_Day0` 至 `WindDirection_Day3` | 采样前第X日的风向,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `WindGust_Day0` 至 `WindGust_Day3` | 采样前第X日的阵风风速,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | | `WindSpeed_Day0` 至 `WindSpeed_Day3` | 采样前第X日的平均风速,数据源自visualcrossing.com(涵盖采样当日及前1至3日)。 | # 使用场景 本数据集可用于训练机器学习模型(例如神经网络、决策树、梯度提升树、KNN、SVM、逻辑回归),以预测生鸡肉胴体中沙门氏菌与弯曲杆菌的存在情况。此外,还可应用于食品安全、农业微生物学、食源性病原体气象影响研究等研究领域。 请使用页面顶部的"使用此数据集(Use this dataset)"按钮,将数据集加载至你偏好的开发库中。若需将数据用于二元分类任务,请以`SalmonellaSPAnalysis`或`CampylobacterAnalysis30ml`列作为目标标签。`EstablishmentID`列可用于实现机构分层的训练集/测试集划分,但不应纳入特征矩阵。 python import pandas as pd from datasets import load_dataset ds = load_dataset("food-ai-nexus/chicken-salmonella-campylobacter-us-facilities") df = ds["train"].to_pandas() # 授权协议 本数据集采用MIT许可证(MIT License)授权,仅可用于研究与教育用途。 # 参考文献 本数据集源自美国农业部食品安全检验局(USDA FSIS)官网的"Dataset: Raw Poultry (Current)"数据集,网址为: https://www.fsis.usda.gov/news-events/publications/raw-poultry-sampling
提供机构:
food-ai-nexus
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作