food-ai-nexus/raw-milk-quality-nys-farms
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/food-ai-nexus/raw-milk-quality-nys-farms
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: FarmID
dtype: int64
- name: Sampling
dtype: string
- name: scc
dtype: float64
- name: logscc
dtype: float64
- name: bf
dtype: float64
- name: lactose
dtype: float64
- name: protein
dtype: float64
- name: solids
dtype: float64
- name: mun
dtype: float64
- name: denovofa
dtype: float64
- name: mixedfa
dtype: float64
- name: preformfa
dtype: float64
- name: bactospc
dtype: float64
- name: logbacto
dtype: float64
- name: tlc
dtype: float64
- name: logtlc
dtype: float64
- name: n
dtype: float64
- name: l
dtype: float64
- name: m
dtype: float64
- name: apc
dtype: float64
- name: logapc
dtype: float64
- name: pi
dtype: float64
- name: logpi
dtype: float64
- name: lpc
dtype: float64
- name: loglpc
dtype: float64
- name: cc
dtype: float64
- name: logcc
dtype: float64
- name: msc
dtype: float64
- name: logmsc
dtype: float64
- name: tsc
dtype: float64
- name: logtsc
dtype: float64
- name: bab
dtype: float64
- name: logbab
dtype: float64
- name: psc
dtype: int64
- name: logpsc
dtype: float64
- name: sensory
dtype: float64
- name: Attributes
dtype: string
- name: c18_1
dtype: float64
- name: munsat_db_fa
dtype: float64
- name: Collection_Date
dtype: string
- name: Month
dtype: int64
- name: Farm_Name
dtype: string
- name: Region
dtype: string
- name: Housing_Style
dtype: string
- name: Stocking_Density
dtype: int64
- name: Bedding_Type
dtype: string
- name: Bedding_Additives
dtype: string
- name: Bedding_Frequency
dtype: float64
- name: Alleyway_Cleaning
dtype: string
- name: Number_Cows
dtype: int64
- name: Cows_Milk_Frequency
dtype: float64
- name: Robot_Milk
dtype: string
- name: Predip_Use
dtype: string
- name: Predip_Type
dtype: string
- name: Postdip_Type
dtype: string
- name: Udder_Stimulation
dtype: string
- name: Udder_Clipped_Flamed
dtype: string
- name: Udder_Clipped_Flamed_frequency
dtype: float64
- name: Udder_Clipped_Flamed_Time_3
dtype: string
- name: Parlor_Type
dtype: string
- name: Water_to_Clean_Milking
dtype: string
- name: Cows_Present_While_Clean_Milking
dtype: string
- name: Cow_Holding_Area
dtype: string
- name: Water_to_Clean_Holding
dtype: string
- name: Cows_Present_While_Clean_Holding
dtype: string
- name: Teat_Scoring
dtype: string
- name: Udder_Hygeine_Scoring
dtype: string
- name: Towel_Type
dtype: string
- name: Detergent_Towel
dtype: string
- name: Bleach_Chlorine_Towel
dtype: string
- name: Machine_Dry_Towel
dtype: string
- name: Cow_Per_Towel
dtype: string
- name: Towel_Replacement
dtype: string
- name: Mastitic_Milking
dtype: string
- name: Pounds_Vacuum
dtype: float64
- name: Milking_Liner_Check
dtype: string
- name: Pulsation_Milking_Unit
dtype: string
- name: System_Sanitize_Frequency
dtype: float64
- name: Water_Temperature
dtype: float64
- name: Water_Softner
dtype: string
- name: Water_Purification
dtype: string
- name: Plate_Cooler
dtype: int64
- name: Sprinkler
dtype: string
- name: Sprinkler_3_days
dtype: string
- name: Feed_Additives
dtype: string
- name: Pasture
dtype: string
- name: Pasture_Time
dtype: float64
- name: Water_Purification_Chemical
dtype: int64
- name: Water_Purification_Physical
dtype: int64
- name: Water_Purification_UV
dtype: int64
- name: Water_Sources_Ground
dtype: int64
- name: Water_Sources_Surface
dtype: int64
- name: Water_Sources_Municipal
dtype: int64
- name: Chiller
dtype: int64
- name: Udder_Clipped_Flamed_Consistency
dtype: string
- name: Pounds_Vacuum_Known
dtype: string
- name: Water_Temperature_Known
dtype: string
splits:
- name: train
num_bytes: 428882
num_examples: 569
download_size: 428882
dataset_size: 428882
configs:
- config_name: default
data_files:
- split: train
path: data/train.csv
license: cc-by-4.0
task_categories:
- tabular-classification
- tabular-regression
tags:
- food-spoilage
- agriculture
- dairy
- microbiology
language:
- en
size_categories:
- n<1K
pretty_name: Raw Milk Quality and Dairy Farm Characteristics (New York State)
---
**Raw Milk Quality and Dairy Farm Characteristics (New York State)** is a longitudinal tabular dataset linking bulk tank raw milk quality measurements to farm management practices and characteristics across 96 conventional dairy farms in New York State.
With this dataset, researchers can train machine learning models to identify farm-level predictors of raw milk quality outcomes, including microbial counts, somatic cell count, milk composition, and sensory defects.
# Content
- The dataset contains 569 bulk tank raw milk samples collected from 96 farms across New York State between July 2023 and September 2024 (15 months).
- Each farm was sampled approximately every 2 months for up to 6 visits (sampling rounds A–F).
- It spans 97 columns covering milk quality outcomes (microbial counts, somatic cell count, milk composition, sensory scores) and farm characteristics (housing, bedding, milking practices, equipment, water management, and more).
- Farms varied widely in size (14–6,400 lactating cows), milking system (conventional parlor vs. robotic), and geographical region.
- The dataset was used to fit random forest models predicting 11 milk quality outcomes. See the associated publication for full modeling details.
# Data Fields
The dataset contains 97 columns organized into four groups: identifiers, milk quality outcomes (with log-transformed versions), and farm survey variables.
**Identifiers**
| Column | Description |
|---|---|
| `FarmID` | Anonymous numeric farm identifier (103–202) |
| `Sampling` | Sampling round (A–F, approximately every 2 months) |
| `Collection_Date` | Date of sample collection (YYYY-MM-DD) |
| `Month` | Month of collection (integer, 1–12) |
| `Farm_Name` | Anonymous farm name |
| `Region` | Geographic region within New York State |
**Somatic Cell Count**
| Column | Description |
|---|---|
| `scc` | Somatic cell count (cells/mL) |
| `logscc` | log₁₀(scc) |
**Milk Composition (FTIR)**
| Column | Description |
|---|---|
| `bf` | Butterfat (g/100 g milk) |
| `lactose` | Lactose (g/100 g milk) |
| `protein` | True protein (g/100 g milk) |
| `solids` | Total solids (g/100 g milk) |
| `mun` | Milk urea nitrogen (mg/dL) |
| `denovofa` | De novo fatty acids (% of total fatty acids) |
| `mixedfa` | Mixed-origin fatty acids (% of total fatty acids) |
| `preformfa` | Preformed fatty acids (% of total fatty acids) |
| `c18_1` | C18:1 fatty acid (% of total fatty acids) |
| `munsat_db_fa` | Ratio of monounsaturated to saturated and double-bond fatty acids |
**Microbial Quality**
| Column | Description |
|---|---|
| `bactospc` | Bactoscan flow cytometry count (cells/mL) |
| `logbacto` | log₁₀(bactospc) |
| `tlc` | Total laboratory count — aerobic plate count at 32°C (cfu/mL) |
| `logtlc` | log₁₀(tlc) |
| `n` | Nonfragmented spore count — aerobic spore count (cfu/mL) |
| `l` | Lab pasteurization count — heat-treated APC (cfu/mL) |
| `m` | Modified lab pasteurization count — heat-treated APC variant (cfu/mL) |
| `apc` | Aerobic plate count at 21°C (cfu/mL) |
| `logapc` | log₁₀(apc) |
| `pi` | Preliminary incubation count — APC after 18h at 13°C (cfu/mL) |
| `logpi` | log₁₀(pi) |
| `lpc` | Lab pasteurization count — heat-treated APC (cfu/mL) |
| `loglpc` | log₁₀(lpc) |
| `cc` | Coliform count (cfu/mL) |
| `logcc` | log₁₀(cc) |
| `msc` | Mesophilic spore count (spores/mL) |
| `logmsc` | log₁₀(msc) |
| `tsc` | Thermophilic spore count (spores/mL) |
| `logtsc` | log₁₀(tsc) |
| `bab` | Butyric acid bacteria count (spores/L) |
| `logbab` | log₁₀(bab) |
| `psc` | Psychrotrophic spore count (MPN/L; left-censored at 20 MPN/L, imputed as 5 MPN/L) |
| `logpsc` | log₁₀(psc) |
**Sensory**
| Column | Description |
|---|---|
| `sensory` | Expert panel sensory score (0–10 scale; higher = more defective) |
| `Attributes` | Descriptive sensory attribute(s) identified by the panel |
**Farm Characteristics**
| Column | Description |
|---|---|
| `Housing_Style` | Barn housing type (e.g., freestall, tiestall, drylot) |
| `Stocking_Density` | Stocking density (% of barn capacity; NA imputed as 100%) |
| `Bedding_Type` | Bedding material category (organic, inorganic, combo, none) |
| `Bedding_Additives` | Whether bedding additives are used (yes, no) |
| `Bedding_Frequency` | Frequency of bedding addition (times/week; 0 = not applicable) |
| `Alleyway_Cleaning` | Alleyway cleaning method (tractor, manual, auto, No Alleyway, other) |
| `Number_Cows` | Number of lactating cows |
| `Cows_Milk_Frequency` | Milking frequency (times/day) |
| `Robot_Milk` | Whether robotic milking is used (yes, no) |
| `Parlor_Type` | Milking parlor type (herringbone, parallel, rotary, robots, other) |
| `Region` | Geographic region within New York State |
| `Feed_Additives` | Whether feed additives are used (yes, no) |
| `Pasture` | Whether cows have pasture access (yes, no) |
| `Pasture_Time` | Average daily pasture time (hours/day; 0 if no pasture) |
**Milking Practices**
| Column | Description |
|---|---|
| `Predip_Use` | Whether pre-dip is used (yes, no, Cow Brush) |
| `Predip_Type` | Pre-dip product type (iodine, hydrogen peroxide, chlorine dioxide, chlorine, chlorohexidine, other, no pre dip) |
| `Postdip_Type` | Post-dip product type (iodine, chlorine dioxide, chlorohexidine, combination, other, no post dip) |
| `Udder_Stimulation` | Whether forestripping or other stimulation is used (yes, no, inconsistently) |
| `Udder_Clipped_Flamed` | Whether udders are clipped or flamed (yes, no) |
| `Udder_Clipped_Flamed_frequency` | Frequency of clipping/flaming (times/year; NA = inconsistent schedule) |
| `Udder_Clipped_Flamed_Time_3` | Time since last clipping/flaming (3 months or less, More than three months, Not clipped/flamed) |
| `Udder_Clipped_Flamed_Consistency` | Consistency of clipping/flaming schedule (Consistent, Inconsistent) |
| `Towel_Type` | Teat drying method (individual paper, cloth, brush, none) |
| `Detergent_Towel` | Whether detergent is used to wash towels (yes, no) |
| `Bleach_Chlorine_Towel` | Whether bleach/chlorine is used to wash towels (yes, no) |
| `Machine_Dry_Towel` | Whether towels are machine dried (yes, no) |
| `Cow_Per_Towel` | Number of cows per towel (0 = no individual towel used) |
| `Towel_Replacement` | Towel replacement frequency |
| `Mastitic_Milking` | Whether mastitic cows are milked last or separately (yes, no, inconsistently) |
| `Teat_Scoring` | Whether teat scoring is performed (yes, no) |
| `Udder_Hygeine_Scoring` | Whether udder hygiene scoring is performed (yes, no) |
**Milking Equipment**
| Column | Description |
|---|---|
| `Pounds_Vacuum` | Milking vacuum level (pounds; NA if unknown) |
| `Pounds_Vacuum_Known` | Whether vacuum level is known (Known, Unknown, NA) |
| `Milking_Liner_Check` | Frequency of milking liner inspection/replacement |
| `Pulsation_Milking_Unit` | Whether pulsation is checked on milking units (yes, no) |
| `Plate_Cooler` | Whether a plate cooler is used (1 = yes, 0 = no) |
| `Chiller` | Whether a supplemental chiller (pre-chiller or tube cooler) is used (1 = yes, 0 = no) |
**Water & Sanitation**
| Column | Description |
|---|---|
| `System_Sanitize_Frequency` | Frequency of milking system cleaning and sanitizing (times/day) |
| `Water_Temperature` | Hot water temperature used for cleaning/sanitation cycle (°F; NA if unknown or no pipeline) |
| `Water_Temperature_Known` | Whether water temperature is known (Known, Unknown, No pipeline) |
| `Water_Softner` | Whether a water softener is used (yes, no) |
| `Water_Purification` | Whether water purification is used (yes, no) |
| `Water_Purification_Chemical` | Whether chemical water purification is used (1 = yes, 0 = no) |
| `Water_Purification_Physical` | Whether physical water purification is used (1 = yes, 0 = no) |
| `Water_Purification_UV` | Whether UV water purification is used (1 = yes, 0 = no) |
| `Water_Sources_Ground` | Whether ground water is a source (1 = yes, 0 = no) |
| `Water_Sources_Surface` | Whether surface water is a source (1 = yes, 0 = no) |
| `Water_Sources_Municipal` | Whether municipal water is a source (1 = yes, 0 = no) |
| `Water_to_Clean_Milking` | Whether water is used to clean the milking area (yes, no) |
| `Cows_Present_While_Clean_Milking` | Whether cows are present while the milking area is cleaned (yes, no) |
| `Cow_Holding_Area` | Whether a cow holding area exists (yes, no) |
| `Water_to_Clean_Holding` | Whether water is used to clean the holding area (yes, no, No holding area) |
| `Cows_Present_While_Clean_Holding` | Whether cows are present while the holding area is cleaned (yes, no) |
| `Sprinkler` | Whether sprinklers are used in the holding area (yes, no) |
| `Sprinkler_3_days` | Whether sprinklers were used in the 3 days prior to sampling (yes, no) |
# Uses
The dataset was originally used to fit random forest models predicting 11 bulk tank raw milk quality outcomes from farm management and characteristic variables. It can also be used for research in dairy food safety, agricultural microbiology, farm management optimization, and longitudinal mixed-effects modeling.
Use the **"Use this dataset"** button at the top of the page to load the dataset into your preferred library. To load and prepare the data:
```python
import pandas as pd
from datasets import load_dataset
ds = load_dataset("food-ai-nexus/raw-milk-quality-nys-farms")
df = ds["train"].to_pandas()
```
# License
This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). It is intended for research and educational use. Please cite the associated publication when using this dataset.
# Reference
```bibtex
@article{shaposhnikov2026sporeformers,
title={From sporeformers to sensory: Measures of bulk tank raw milk quality are impacted by dairy farm characteristics and management practices},
author={Shaposhnikov, M.M. and Weachock, R.L. and Wasserlauf-Pepper, Z.D. and Qian, C. and Barbano, D.M. and Martin, N.H.},
journal={Journal of Dairy Science},
year={2026},
note={In Press},
doi={10.3168/jds.2025-27772}
}
```
提供机构:
food-ai-nexus



