five

food-ai-nexus/organic-milk-spores-us-farms

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/food-ai-nexus/organic-milk-spores-us-farms
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: FarmID dtype: string - name: SampleTime dtype: string - name: Sampling dtype: string - name: Loc dtype: string - name: CertYear dtype: int64 - name: HouseStyle dtype: string - name: PastureTime dtype: float64 - name: StockDen dtype: float64 - name: Bed dtype: string - name: BedAdd dtype: string - name: BedAddFreq dtype: string - name: StallCleanFreq dtype: string - name: CowNum dtype: float64 - name: MilkFreq dtype: float64 - name: PplNumPerWk dtype: float64 - name: PplNumPerShift dtype: float64 - name: NonFamEmpNum dtype: float64 - name: FullEmpNum dtype: int64 - name: PartEmpNum dtype: int64 - name: Glove dtype: string - name: GloveFreq dtype: string - name: PreDipType dtype: string - name: PostDipType dtype: string - name: UdderSti dtype: string - name: ClipFlame dtype: string - name: CowMilkLoc dtype: string - name: Parlor dtype: string - name: CowParlorClean dtype: string - name: CowWaitMilk dtype: string - name: CowHoldClean dtype: string - name: TeatEndScore dtype: string - name: UdderHygScore dtype: string - name: TowelType dtype: string - name: CowTowelWipe dtype: string - name: CornSilage dtype: string - name: Haylage dtype: string - name: CornMeal dtype: string - name: DryHay dtype: string - name: Baleage dtype: string - name: GrassSilage dtype: string - name: Earlage dtype: string - name: Snaplage dtype: string - name: BioFeedAdd dtype: string - name: DryMatPercent dtype: string - name: FeedPurchase dtype: string - name: ClosedHerd dtype: string - name: Test dtype: string - name: Conc dtype: float64 - name: HoldHose dtype: string - name: HoldManScrap dtype: string - name: HoldFluSys dtype: string - name: HoldScrBru dtype: string - name: ParlorHose dtype: string - name: ParlorManScrap dtype: string - name: ParlorFluSys dtype: string - name: ParlorDeter dtype: string - name: ParlorScrBru dtype: string - name: ParlorRobot dtype: string - name: TowelChloDeter dtype: string - name: TowelDeter dtype: string - name: TowelBleac dtype: string - name: TowelMacDry dtype: string - name: TowelLaundry dtype: string - name: TowelVinegar dtype: string - name: TowelWashMac dtype: string - name: SPCvar dtype: float64 - name: tempmax dtype: float64 - name: tempmin dtype: float64 - name: temp dtype: float64 - name: humidity dtype: float64 - name: precip dtype: float64 - name: precipcover dtype: float64 - name: windgust dtype: float64 - name: windspeed dtype: float64 - name: solarradiation dtype: float64 - name: tempmax_1d dtype: float64 - name: tempmin_1d dtype: float64 - name: temp_1d dtype: float64 - name: humidity_1d dtype: float64 - name: precip_1d dtype: float64 - name: precipcover_1d dtype: float64 - name: windgust_1d dtype: float64 - name: windspeed_1d dtype: float64 - name: solarradiation_1d dtype: float64 - name: tempmax_2d dtype: float64 - name: tempmin_2d dtype: float64 - name: temp_2d dtype: float64 - name: humidity_2d dtype: float64 - name: precip_2d dtype: float64 - name: precipcover_2d dtype: float64 - name: windgust_2d dtype: float64 - name: windspeed_2d dtype: float64 - name: solarradiation_2d dtype: float64 - name: tempmax_3d dtype: float64 - name: tempmin_3d dtype: float64 - name: temp_3d dtype: float64 - name: humidity_3d dtype: float64 - name: precip_3d dtype: float64 - name: precipcover_3d dtype: float64 - name: windgust_3d dtype: float64 - name: windspeed_3d dtype: float64 - name: solarradiation_3d dtype: float64 splits: - name: train num_bytes: 1581808 num_examples: 2657 download_size: 1581808 dataset_size: 1581808 configs: - config_name: default data_files: - split: train path: data/train.csv license: cc-by-4.0 task_categories: - tabular-classification - tabular-regression tags: - food-spoilage - agriculture - dairy - microbiology language: - en size_categories: - 1K<n<10K pretty_name: Bacterial Spore Levels in Organic Bulk Tank Milk (US Farms) --- **Bacterial Spore Levels in Organic Bulk Tank Milk (US Farms)** is a longitudinal tabular dataset linking bacterial spore counts in bulk tank raw milk to farm management characteristics and meteorological conditions across 102 certified organic dairy farms in 11 US states. With this dataset, researchers can train machine learning models to identify farm-level and environmental predictors of bacterial spore contamination in organic bulk tank milk, and study the interplay between farm practices, weather, and microbial quality. # Content - The dataset contains 2,657 bulk tank raw milk test records from 102 certified organic dairy farms across 11 US states (California, Colorado, Iowa, Idaho, Minnesota, New York, Oregon, Pennsylvania, Vermont, Washington, Wisconsin). - Farms were sampled bimonthly between May 2021 and August 2022 (up to 6 sampling rounds per farm, labeled A–H). - It spans 102 columns covering the microbial test outcome, farm survey variables (housing, bedding, milking practices, feed, staffing), and meteorological variables for the day of sampling and the 3 preceding days. - Farms varied widely in herd size (24–4,000 lactating cows), milking system (conventional parlor, stall/barn, or robotic), and geographic region. - The dataset was used to train gradient-boosted tree and random forest models predicting bacterial spore levels. See the associated publication for full modeling details. # Data Fields The dataset contains 102 columns organized into five groups: identifiers, microbial outcome, farm survey variables, holding/parlor/towel cleaning indicators, and meteorological variables. **Identifiers** | Column | Description | |---|---| | `FarmID` | Anonymous farm identifier (R1–R102) | | `SampleTime` | Date of sample collection (YYYY-MM-DD) | | `Sampling` | Sampling round identifier (A–H, bimonthly) | | `Loc` | Farm location (US state abbreviation: CA, CO, IA, ID, MN, NY, OR, PA, VT, WA, WI) | **Microbial Outcome** | Column | Description | |---|---| | `Test` | Type of microbiological test performed (SPC = standard plate count, BAB = butyric acid bacteria, MSC = mesophilic spore count) | | `Conc` | Microbial concentration (cfu/mL or spores/mL; left-censored values imputed as half the detection limit) | | `SPCvar` | Farm-level standard deviation of log₁₀(SPC) across all sampling rounds; used as a predictor of within-farm variability | **Farm Characteristics** | Column | Description | |---|---| | `CertYear` | Number of years the farm has been certified organic | | `CowNum` | Number of milking cows on the day of survey (one missing value imputed as cross-farm mean) | | `HouseStyle` | Housing style for lactating cows ("Free stalls", "Tie stalls or stanchions", "Bedded pack", "Other") | | `StockDen` | Stocking density — ratio of cows to available stalls (0 for bedded pack, pasture, and dry lot farms) | | `Bed` | Bedding material ("Sawdust", "Straw", "Manure solids", "OtherInorg", "Organic and inorganic blend", "Other organic") | | `BedAdd` | Bedding additive used ("No bed additives", "Limestone", "Other") | | `BedAddFreq` | Frequency of adding or topping up bedding ("< 1x per day", "1x per day", ">= 2x per day") | | `StallCleanFreq` | Frequency of stall cleaning ("< 1x/day", "1x per day", "2x per day", ">= 3x per day") | | `PastureTime` | Hours per day lactating cows spend on pasture at time of survey | | `ClosedHerd` | Whether the farm maintains a closed herd ("Yes", "No") | | `FeedPurchase` | Whether the farm purchases feed ("Yes", "No") | | `DryMatPercent` | Percentage of dry matter intake currently from pasture ("< 40%", "40-70%", "> 70%") | **Staffing** | Column | Description | |---|---| | `PplNumPerWk` | Number of different people milking cows throughout a week | | `PplNumPerShift` | Number of people milking cows in one milking shift | | `NonFamEmpNum` | Number of non-family employees who milk cows (0 if none) | | `FullEmpNum` | Number of full-time employees who milk cows (0 if none) | | `PartEmpNum` | Number of part-time employees who milk cows (0 if none) | **Milking Practices** | Column | Description | |---|---| | `MilkFreq` | Average number of milkings per cow per day (numeric; 2.75 = midpoint of the original "2.7–2.8" robotic range) | | `CowMilkLoc` | Location where cows are milked ("Parlor", "Stall/barn", "Robot") | | `Parlor` | Whether cows are milked in a parlor ("Yes", "No") | | `Glove` | Whether gloves are worn during milking ("Yes", "No", "Sometimes") | | `GloveFreq` | Frequency of glove changes per milking shift ("1", "1-2", ">3", "unknown") | | `PreDipType` | Type of pre-dip used ("Iodine based", "Hydrogen peroxide based", "No predip", "Other") | | `PostDipType` | Type of post-dip used ("Iodine based", "Hydrogen peroxide based", "No PostDip", "Other") | | `UdderSti` | Udder stimulation method prior to milking ("Forestripping", "None", "Other") | | `ClipFlame` | Whether lactating cow udders are clipped and/or flamed ("Yes", "No") | | `CowMilkLoc` | Location where cows are milked ("Parlor", "Stall/barn", "Robot") | | `CowParlorClean` | Whether cows are present when the parlor or milking area is cleaned ("Yes", "No") | | `CowWaitMilk` | Whether cows wait in a holding area before milking ("Yes", "No") | | `CowHoldClean` | Whether cows are present during holding area cleaning ("Yes", "No", "No holding area") | | `TeatEndScore` | Whether teat end scoring is routinely performed on the farm ("Yes", "No") | | `UdderHygScore` | Whether udder hygiene is routinely scored on the farm ("Yes", "No") | | `TowelType` | Type of towel used to clean teats ("Paper", "Laundered", "Moistened wipes", "Robot brush") | | `CowTowelWipe` | Number of cows wiped with one individual towel ("0-1", "> 1") | **Feed Types (Binary Indicators)** Each column indicates whether the farm feeds the specified feed type ("Yes" / "No"). | Column | Description | |---|---| | `CornSilage` | Farm feeds corn silage | | `Haylage` | Farm feeds haylage | | `CornMeal` | Farm feeds corn meal | | `DryHay` | Farm feeds dry hay | | `Baleage` | Farm feeds baleage | | `GrassSilage` | Farm feeds grass silage | | `Earlage` | Farm feeds earlage | | `Snaplage` | Farm feeds snaplage | | `BioFeedAdd` | Whether biological feed additives are used ("None", "Other") | **Holding Area Cleaning Methods (Boolean Indicators)** Derived by splitting the free-text holding area cleaning response. `TRUE` if the method was mentioned. | Column | Description | |---|---| | `HoldHose` | Holding area cleaned by hosing | | `HoldManScrap` | Holding area cleaned by manual scraping | | `HoldFluSys` | Holding area cleaned by flush system | | `HoldScrBru` | Holding area cleaned by scrub brush | **Parlor Cleaning Methods (Boolean Indicators)** Derived by splitting the free-text parlor cleaning response. `TRUE` if the method was mentioned. | Column | Description | |---|---| | `ParlorHose` | Parlor cleaned by hosing | | `ParlorManScrap` | Parlor cleaned by manual scraping | | `ParlorFluSys` | Parlor cleaned by flush/automated system | | `ParlorDeter` | Parlor cleaned with detergent | | `ParlorScrBru` | Parlor cleaned with scrub brush | | `ParlorRobot` | Parlor cleaned by robot | **Towel Cleaning Protocol (Boolean Indicators)** Derived by splitting the free-text towel cleaning protocol response. `TRUE` if the method was mentioned. | Column | Description | |---|---| | `TowelChloDeter` | Towels washed with chlorinated detergent | | `TowelDeter` | Towels washed with detergent | | `TowelBleac` | Towels washed with bleach | | `TowelMacDry` | Towels machine dried | | `TowelLaundry` | Towels sent to laundry service | | `TowelVinegar` | Towels washed with vinegar | | `TowelWashMac` | Towels washed in a washing machine | **Meteorological Variables** Weather data were obtained from Visual Crossing for each farm location. Four time windows are provided: the day of sampling (`_0d`, no suffix), 1 day prior (`_1d`), 2 days prior (`_2d`), and 3 days prior (`_3d`). The columns below are repeated for each suffix. | Column (base name) | Description | Units | |---|---|---| | `tempmax` | Maximum daily temperature | °F | | `tempmin` | Minimum daily temperature | °F | | `temp` | Mean daily temperature | °F | | `humidity` | Mean relative humidity | % | | `precip` | Total precipitation | inches | | `precipcover` | Percentage of day with measurable precipitation | % | | `windgust` | Maximum wind gust speed (0 if missing) | mph | | `windspeed` | Mean wind speed | mph | | `solarradiation` | Mean solar radiation | W/m² | # Uses The dataset was originally used to train gradient-boosted tree (XGBoost) and random forest models predicting bacterial spore levels in organic bulk tank milk from farm characteristics and meteorological variables. It can also be used for research in organic dairy food safety, agricultural microbiology, farm management optimization, and longitudinal mixed-effects modeling. Use the **"Use this dataset"** button at the top of the page to load the dataset into your preferred library. To load and prepare the data: ```python import pandas as pd from datasets import load_dataset ds = load_dataset("food-ai-nexus/organic-milk-spores-us-farms") df = ds["train"].to_pandas() ``` # License This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). It is intended for research and educational use. Please cite the associated publication when using this dataset. # Reference ```bibtex @article{qian2025organic, title={A Machine--Learning Approach Reveals That Bacterial Spore Levels in Organic Bulk Tank Milk are Dependent on Farm Characteristics and Meteorological Factors}, author={Qian, C. and Wiedmann, M. and Martin, N.H.}, journal={Journal of Food Protection}, volume={88}, pages={100477}, year={2025}, doi={10.1016/j.jfp.2024.100477} } ```

### 数据集名称 美国有机农场散装罐生乳细菌孢子水平(Bacterial Spore Levels in Organic Bulk Tank Milk (US Farms)) 该数据集为纵向表格数据集,关联了美国11个州的102家认证有机奶牛场的散装罐生乳细菌孢子计数与农场管理特征及气象条件。 研究人员可利用该数据集训练机器学习模型,以识别有机散装罐生乳中细菌孢子污染的农场级与环境预测因子,并探究农场操作、天气与微生物质量之间的相互作用。 ## 数据集概况 - 该数据集包含来自美国11个州(加利福尼亚州、科罗拉多州、爱荷华州、爱达荷州、明尼苏达州、纽约州、俄勒冈州、宾夕法尼亚州、佛蒙特州、华盛顿州、威斯康星州)的102家认证有机奶牛场的2657条散装罐生乳检测记录。 - 采样于2021年5月至2022年8月期间每双月进行一次,每家农场最多进行6轮采样,标记为A–H。 - 数据集包含102个字段,涵盖微生物检测结果、农场调查变量(饲养环境、垫料、挤奶操作、饲料、人员配置)以及采样当日及前3日的气象变量。 - 农场的牛群规模(泌乳牛24~4000头)、挤奶系统(传统挤奶厅、栏舍、机器人挤奶)及地理区域差异显著。 - 该数据集曾用于训练梯度提升树与随机森林模型以预测细菌孢子水平,完整建模细节请参见相关发表论文。 ## 数据字段 数据集包含102个字段,分为五大类:标识符、微生物检测结果、农场特征、人员配置、挤奶操作、饲料类型、待挤区清洁方法、挤奶厅清洁方法、毛巾清洁规程及气象变量。 ### 标识符 | 列名 | 说明 | |---|---| | `FarmID` | 匿名农场编号(R1–R102) | | `SampleTime` | 采样日期(格式为YYYY-MM-DD) | | `Sampling` | 采样轮次标识符(A–H,每双月一次) | | `Loc` | 农场所在州(美国州名缩写:CA、CO、IA、ID、MN、NY、OR、PA、VT、WA、WI) | ### 微生物检测结果 | 列名 | 说明 | |---|---| | `Test` | 微生物检测类型(SPC=标准平板计数,BAB=丁酸菌,MSC=嗜热孢子计数) | | `Conc` | 微生物浓度(cfu/mL或孢子/mL;左截尾值以检测限的一半插补) | | `SPCvar` | 农场所有采样轮次的log₁₀(SPC)的农场级标准差,用于预测农场内变异性 | ### 农场特征 | 列名 | 说明 | |---|---| | `CertYear` | 农场获得有机认证的年限 | | `CowNum` | 调查当日的泌乳牛数量(1个缺失值以跨农场均值插补) | | `HouseStyle` | 泌乳牛饲养方式(“自由栏位”、“拴系栏位或支柱栏”、“垫料群养栏”、“其他”) | | `StockDen` | 存栏密度——泌乳牛与可用栏位的比值(垫料群养、牧场及散养农场设为0) | | `Bed` | 垫料材质(“锯末”、“稻草”、“固体粪便”、“其他无机垫料”、“有机与无机混合垫料”、“其他有机垫料”) | | `BedAdd` | 使用的垫料添加剂(“无垫料添加剂”、“石灰石”、“其他”) | | `BedAddFreq` | 添加或补充垫料的频率(“<1次/天”、“1次/天”、“≥2次/天”) | | `StallCleanFreq` | 栏位清洁频率(“<1次/天”、“1次/天”、“2次/天”、“≥3次/天”) | | `PastureTime` | 调查当日泌乳牛每日在牧场活动的时长 | | `ClosedHerd` | 农场是否维持封闭牛群(“是”、“否”) | | `FeedPurchase` | 农场是否外购饲料(“是”、“否”) | | `DryMatPercent` | 当前牧场饲草占干物质进食量的百分比(“<40%”、“40-70%”、“>70%”) | ### 人员配置 | 列名 | 说明 | |---|---| | `PplNumPerWk` | 每周参与挤奶的不同人员数量 | | `PplNumPerShift` | 单次挤奶班次参与挤奶的人员数量 | | `NonFamEmpNum` | 非家族雇员中参与挤奶的人数(无则为0) | | `FullEmpNum` | 全职雇员中参与挤奶的人数(无则为0) | | `PartEmpNum` | 兼职雇员中参与挤奶的人数(无则为0) | ### 挤奶操作 | 列名 | 说明 | |---|---| | `MilkFreq` | 每头泌乳牛每日平均挤奶次数(数值型;2.75为原“2.7–2.8”机器人挤奶范围的中点值) | | `CowMilkLoc` | 泌乳牛挤奶地点(“挤奶厅”、“栏舍”、“机器人挤奶区”) | | `Parlor` | 泌乳牛是否在挤奶厅挤奶(“是”、“否”) | | `Glove` | 挤奶时是否佩戴手套(“是”、“否”、“有时”) | | `GloveFreq` | 单次挤奶班次更换手套的频率(“1”、“1-2”、“>3”、“未知”) | | `PreDipType` | 挤奶前药浴类型(“碘制剂”、“过氧化氢制剂”、“无挤奶前药浴”、“其他”) | | `PostDipType` | 挤奶后药浴类型(“碘制剂”、“过氧化氢制剂”、“无挤奶后药浴”、“其他”) | | `UdderSti` | 挤奶前乳房刺激方法(“前奶挤弃法”、“无刺激”、“其他”) | | `ClipFlame` | 是否对泌乳牛乳房进行修剪或灼烧(“是”、“否”) | | `CowParlorClean` | 挤奶厅或挤奶区域清洁时是否有泌乳牛在场(“是”、“否”) | | `CowWaitMilk` | 泌乳牛是否在挤奶前在待挤区等候(“是”、“否”) | | `CowHoldClean` | 待挤区清洁时是否有泌乳牛在场(“是”、“否”、“无待挤区”) | | `TeatEndScore` | 农场是否常规进行乳头末端评分(“是”、“否”) | | `UdderHygScore` | 农场是否常规进行乳房卫生评分(“是”、“否”) | | `TowelType` | 用于清洁乳头的毛巾类型(“一次性纸巾”、“可水洗毛巾”、“湿巾”、“机器人毛刷”) | | `CowTowelWipe` | 单条毛巾擦拭的泌乳牛数量(“0-1”、“>1”) | ### 饲料类型(二元指示变量) 每个列指示农场是否饲喂该类饲料(“是”/“否”)。 | 列名 | 说明 | |---|---| | `CornSilage` | 农场饲喂玉米青贮 | | `Haylage` | 农场饲喂青贮干草 | | `CornMeal` | 农场饲喂玉米粉 | | `DryHay` | 农场饲喂干干草 | | `Baleage` | 农场饲喂包膜青贮 | | `GrassSilage` | 农场饲喂牧草青贮 | | `Earlage` | 农场饲喂果穗青贮 | | `Snaplage` | 农场饲喂带穗秸秆青贮 | | `BioFeedAdd` | 是否使用生物饲料添加剂(“无”、“其他”) | ### 待挤区清洁方法(布尔指示变量) 从自由文本待挤区清洁响应拆分得到,若提及该方法则为`TRUE`。 | 列名 | 说明 | |---|---| | `HoldHose` | 待挤区通过水冲清洁 | | `HoldManScrap` | 待挤区通过人工刮除清洁 | | `HoldFluSys` | 待挤区通过冲洗系统清洁 | | `HoldScrBru` | 待挤区通过刷洗清洁 | ### 挤奶厅清洁方法(布尔指示变量) 从自由文本挤奶厅清洁响应拆分得到,若提及该方法则为`TRUE`。 | 列名 | 说明 | |---|---| | `ParlorHose` | 挤奶厅通过水冲清洁 | | `ParlorManScrap` | 挤奶厅通过人工刮除清洁 | | `ParlorFluSys` | 挤奶厅通过冲洗/自动化系统清洁 | | `ParlorDeter` | 挤奶厅使用清洁剂清洁 | | `ParlorScrBru` | 挤奶厅使用刷洗清洁 | | `ParlorRobot` | 挤奶厅通过机器人清洁 | ### 毛巾清洁规程(布尔指示变量) 从自由文本毛巾清洁规程响应拆分得到,若提及该方法则为`TRUE`。 | 列名 | 说明 | |---|---| | `TowelChloDeter` | 毛巾使用含氯洗涤剂清洗 | | `TowelDeter` | 毛巾使用洗涤剂清洗 | | `TowelBleac` | 毛巾使用漂白剂清洗 | | `TowelMacDry` | 毛巾使用机器烘干 | | `TowelLaundry` | 毛巾委托洗衣服务清洗 | | `TowelVinegar` | 毛巾使用食醋清洗 | | `TowelWashMac` | 毛巾使用洗衣机清洗 | ### 气象变量 气象数据从Visual Crossing获取,对应每个农场的位置。提供四个时间窗口:采样当日(`_0d`,无后缀)、采样前1日(`_1d`)、采样前2日(`_2d`)及采样前3日(`_3d`)。下表的列名基础名会针对每个后缀重复出现。 | 列名(基础名) | 说明 | 单位 | |---|---|---| | `tempmax` | 当日最高气温 | °F | | `tempmin` | 当日最低气温 | °F | | `temp` | 当日平均气温 | °F | | `humidity` | 平均相对湿度 | % | | `precip` | 总降水量 | 英寸 | | `precipcover` | 当日有可测量降水的时长占比 | % | | `windgust` | 最大阵风风速(缺失值设为0) | mph | | `windspeed` | 平均风速 | mph | | `solarradiation` | 平均太阳辐射 | W/m² | ## 数据集用途 该数据集最初用于训练梯度提升树(XGBoost)与随机森林模型,以从农场特征及气象变量预测有机散装罐生乳的细菌孢子水平。其还可用于有机乳品食品安全、农业微生物学、农场管理优化及纵向混合效应建模等领域的研究。 可通过页面顶部的“使用此数据集”按钮将数据集加载至你偏好的库中。加载并预处理数据的示例代码如下: python import pandas as pd from datasets import load_dataset ds = load_dataset("food-ai-nexus/organic-milk-spores-us-farms") df = ds["train"].to_pandas() ## 许可协议 本数据集采用知识共享署名4.0国际许可协议(CC BY 4.0)进行授权,仅用于研究与教育用途。使用本数据集时请引用相关发表论文。 ## 参考文献 bibtex @article{qian2025organic, title={机器学习方法揭示有机散装罐生乳细菌孢子水平取决于农场特征与气象因素}, author={Qian, C. and Wiedmann, M. and Martin, N.H.}, journal={食品保护杂志}, volume={88}, pages={100477}, year={2025}, doi={10.1016/j.jfp.2024.100477} }
提供机构:
food-ai-nexus
二维码
社区交流群
二维码
科研交流群
商业服务