five

food-ai-nexus/raw-milk-quality-nys-farms

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/food-ai-nexus/raw-milk-quality-nys-farms
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: FarmID dtype: int64 - name: Sampling dtype: string - name: scc dtype: float64 - name: logscc dtype: float64 - name: bf dtype: float64 - name: lactose dtype: float64 - name: protein dtype: float64 - name: solids dtype: float64 - name: mun dtype: float64 - name: denovofa dtype: float64 - name: mixedfa dtype: float64 - name: preformfa dtype: float64 - name: bactospc dtype: float64 - name: logbacto dtype: float64 - name: tlc dtype: float64 - name: logtlc dtype: float64 - name: n dtype: float64 - name: l dtype: float64 - name: m dtype: float64 - name: apc dtype: float64 - name: logapc dtype: float64 - name: pi dtype: float64 - name: logpi dtype: float64 - name: lpc dtype: float64 - name: loglpc dtype: float64 - name: cc dtype: float64 - name: logcc dtype: float64 - name: msc dtype: float64 - name: logmsc dtype: float64 - name: tsc dtype: float64 - name: logtsc dtype: float64 - name: bab dtype: float64 - name: logbab dtype: float64 - name: psc dtype: int64 - name: logpsc dtype: float64 - name: sensory dtype: float64 - name: Attributes dtype: string - name: c18_1 dtype: float64 - name: munsat_db_fa dtype: float64 - name: Collection_Date dtype: string - name: Month dtype: int64 - name: Farm_Name dtype: string - name: Region dtype: string - name: Housing_Style dtype: string - name: Stocking_Density dtype: int64 - name: Bedding_Type dtype: string - name: Bedding_Additives dtype: string - name: Bedding_Frequency dtype: float64 - name: Alleyway_Cleaning dtype: string - name: Number_Cows dtype: int64 - name: Cows_Milk_Frequency dtype: float64 - name: Robot_Milk dtype: string - name: Predip_Use dtype: string - name: Predip_Type dtype: string - name: Postdip_Type dtype: string - name: Udder_Stimulation dtype: string - name: Udder_Clipped_Flamed dtype: string - name: Udder_Clipped_Flamed_frequency dtype: float64 - name: Udder_Clipped_Flamed_Time_3 dtype: string - name: Parlor_Type dtype: string - name: Water_to_Clean_Milking dtype: string - name: Cows_Present_While_Clean_Milking dtype: string - name: Cow_Holding_Area dtype: string - name: Water_to_Clean_Holding dtype: string - name: Cows_Present_While_Clean_Holding dtype: string - name: Teat_Scoring dtype: string - name: Udder_Hygeine_Scoring dtype: string - name: Towel_Type dtype: string - name: Detergent_Towel dtype: string - name: Bleach_Chlorine_Towel dtype: string - name: Machine_Dry_Towel dtype: string - name: Cow_Per_Towel dtype: string - name: Towel_Replacement dtype: string - name: Mastitic_Milking dtype: string - name: Pounds_Vacuum dtype: float64 - name: Milking_Liner_Check dtype: string - name: Pulsation_Milking_Unit dtype: string - name: System_Sanitize_Frequency dtype: float64 - name: Water_Temperature dtype: float64 - name: Water_Softner dtype: string - name: Water_Purification dtype: string - name: Plate_Cooler dtype: int64 - name: Sprinkler dtype: string - name: Sprinkler_3_days dtype: string - name: Feed_Additives dtype: string - name: Pasture dtype: string - name: Pasture_Time dtype: float64 - name: Water_Purification_Chemical dtype: int64 - name: Water_Purification_Physical dtype: int64 - name: Water_Purification_UV dtype: int64 - name: Water_Sources_Ground dtype: int64 - name: Water_Sources_Surface dtype: int64 - name: Water_Sources_Municipal dtype: int64 - name: Chiller dtype: int64 - name: Udder_Clipped_Flamed_Consistency dtype: string - name: Pounds_Vacuum_Known dtype: string - name: Water_Temperature_Known dtype: string splits: - name: train num_bytes: 428882 num_examples: 569 download_size: 428882 dataset_size: 428882 configs: - config_name: default data_files: - split: train path: data/train.csv license: cc-by-4.0 task_categories: - tabular-classification - tabular-regression tags: - food-spoilage - agriculture - dairy - microbiology language: - en size_categories: - n<1K pretty_name: Raw Milk Quality and Dairy Farm Characteristics (New York State) --- **Raw Milk Quality and Dairy Farm Characteristics (New York State)** is a longitudinal tabular dataset linking bulk tank raw milk quality measurements to farm management practices and characteristics across 96 conventional dairy farms in New York State. With this dataset, researchers can train machine learning models to identify farm-level predictors of raw milk quality outcomes, including microbial counts, somatic cell count, milk composition, and sensory defects. # Content - The dataset contains 569 bulk tank raw milk samples collected from 96 farms across New York State between July 2023 and September 2024 (15 months). - Each farm was sampled approximately every 2 months for up to 6 visits (sampling rounds A–F). - It spans 97 columns covering milk quality outcomes (microbial counts, somatic cell count, milk composition, sensory scores) and farm characteristics (housing, bedding, milking practices, equipment, water management, and more). - Farms varied widely in size (14–6,400 lactating cows), milking system (conventional parlor vs. robotic), and geographical region. - The dataset was used to fit random forest models predicting 11 milk quality outcomes. See the associated publication for full modeling details. # Data Fields The dataset contains 97 columns organized into four groups: identifiers, milk quality outcomes (with log-transformed versions), and farm survey variables. **Identifiers** | Column | Description | |---|---| | `FarmID` | Anonymous numeric farm identifier (103–202) | | `Sampling` | Sampling round (A–F, approximately every 2 months) | | `Collection_Date` | Date of sample collection (YYYY-MM-DD) | | `Month` | Month of collection (integer, 1–12) | | `Farm_Name` | Anonymous farm name | | `Region` | Geographic region within New York State | **Somatic Cell Count** | Column | Description | |---|---| | `scc` | Somatic cell count (cells/mL) | | `logscc` | log₁₀(scc) | **Milk Composition (FTIR)** | Column | Description | |---|---| | `bf` | Butterfat (g/100 g milk) | | `lactose` | Lactose (g/100 g milk) | | `protein` | True protein (g/100 g milk) | | `solids` | Total solids (g/100 g milk) | | `mun` | Milk urea nitrogen (mg/dL) | | `denovofa` | De novo fatty acids (% of total fatty acids) | | `mixedfa` | Mixed-origin fatty acids (% of total fatty acids) | | `preformfa` | Preformed fatty acids (% of total fatty acids) | | `c18_1` | C18:1 fatty acid (% of total fatty acids) | | `munsat_db_fa` | Ratio of monounsaturated to saturated and double-bond fatty acids | **Microbial Quality** | Column | Description | |---|---| | `bactospc` | Bactoscan flow cytometry count (cells/mL) | | `logbacto` | log₁₀(bactospc) | | `tlc` | Total laboratory count — aerobic plate count at 32°C (cfu/mL) | | `logtlc` | log₁₀(tlc) | | `n` | Nonfragmented spore count — aerobic spore count (cfu/mL) | | `l` | Lab pasteurization count — heat-treated APC (cfu/mL) | | `m` | Modified lab pasteurization count — heat-treated APC variant (cfu/mL) | | `apc` | Aerobic plate count at 21°C (cfu/mL) | | `logapc` | log₁₀(apc) | | `pi` | Preliminary incubation count — APC after 18h at 13°C (cfu/mL) | | `logpi` | log₁₀(pi) | | `lpc` | Lab pasteurization count — heat-treated APC (cfu/mL) | | `loglpc` | log₁₀(lpc) | | `cc` | Coliform count (cfu/mL) | | `logcc` | log₁₀(cc) | | `msc` | Mesophilic spore count (spores/mL) | | `logmsc` | log₁₀(msc) | | `tsc` | Thermophilic spore count (spores/mL) | | `logtsc` | log₁₀(tsc) | | `bab` | Butyric acid bacteria count (spores/L) | | `logbab` | log₁₀(bab) | | `psc` | Psychrotrophic spore count (MPN/L; left-censored at 20 MPN/L, imputed as 5 MPN/L) | | `logpsc` | log₁₀(psc) | **Sensory** | Column | Description | |---|---| | `sensory` | Expert panel sensory score (0–10 scale; higher = more defective) | | `Attributes` | Descriptive sensory attribute(s) identified by the panel | **Farm Characteristics** | Column | Description | |---|---| | `Housing_Style` | Barn housing type (e.g., freestall, tiestall, drylot) | | `Stocking_Density` | Stocking density (% of barn capacity; NA imputed as 100%) | | `Bedding_Type` | Bedding material category (organic, inorganic, combo, none) | | `Bedding_Additives` | Whether bedding additives are used (yes, no) | | `Bedding_Frequency` | Frequency of bedding addition (times/week; 0 = not applicable) | | `Alleyway_Cleaning` | Alleyway cleaning method (tractor, manual, auto, No Alleyway, other) | | `Number_Cows` | Number of lactating cows | | `Cows_Milk_Frequency` | Milking frequency (times/day) | | `Robot_Milk` | Whether robotic milking is used (yes, no) | | `Parlor_Type` | Milking parlor type (herringbone, parallel, rotary, robots, other) | | `Region` | Geographic region within New York State | | `Feed_Additives` | Whether feed additives are used (yes, no) | | `Pasture` | Whether cows have pasture access (yes, no) | | `Pasture_Time` | Average daily pasture time (hours/day; 0 if no pasture) | **Milking Practices** | Column | Description | |---|---| | `Predip_Use` | Whether pre-dip is used (yes, no, Cow Brush) | | `Predip_Type` | Pre-dip product type (iodine, hydrogen peroxide, chlorine dioxide, chlorine, chlorohexidine, other, no pre dip) | | `Postdip_Type` | Post-dip product type (iodine, chlorine dioxide, chlorohexidine, combination, other, no post dip) | | `Udder_Stimulation` | Whether forestripping or other stimulation is used (yes, no, inconsistently) | | `Udder_Clipped_Flamed` | Whether udders are clipped or flamed (yes, no) | | `Udder_Clipped_Flamed_frequency` | Frequency of clipping/flaming (times/year; NA = inconsistent schedule) | | `Udder_Clipped_Flamed_Time_3` | Time since last clipping/flaming (3 months or less, More than three months, Not clipped/flamed) | | `Udder_Clipped_Flamed_Consistency` | Consistency of clipping/flaming schedule (Consistent, Inconsistent) | | `Towel_Type` | Teat drying method (individual paper, cloth, brush, none) | | `Detergent_Towel` | Whether detergent is used to wash towels (yes, no) | | `Bleach_Chlorine_Towel` | Whether bleach/chlorine is used to wash towels (yes, no) | | `Machine_Dry_Towel` | Whether towels are machine dried (yes, no) | | `Cow_Per_Towel` | Number of cows per towel (0 = no individual towel used) | | `Towel_Replacement` | Towel replacement frequency | | `Mastitic_Milking` | Whether mastitic cows are milked last or separately (yes, no, inconsistently) | | `Teat_Scoring` | Whether teat scoring is performed (yes, no) | | `Udder_Hygeine_Scoring` | Whether udder hygiene scoring is performed (yes, no) | **Milking Equipment** | Column | Description | |---|---| | `Pounds_Vacuum` | Milking vacuum level (pounds; NA if unknown) | | `Pounds_Vacuum_Known` | Whether vacuum level is known (Known, Unknown, NA) | | `Milking_Liner_Check` | Frequency of milking liner inspection/replacement | | `Pulsation_Milking_Unit` | Whether pulsation is checked on milking units (yes, no) | | `Plate_Cooler` | Whether a plate cooler is used (1 = yes, 0 = no) | | `Chiller` | Whether a supplemental chiller (pre-chiller or tube cooler) is used (1 = yes, 0 = no) | **Water & Sanitation** | Column | Description | |---|---| | `System_Sanitize_Frequency` | Frequency of milking system cleaning and sanitizing (times/day) | | `Water_Temperature` | Hot water temperature used for cleaning/sanitation cycle (°F; NA if unknown or no pipeline) | | `Water_Temperature_Known` | Whether water temperature is known (Known, Unknown, No pipeline) | | `Water_Softner` | Whether a water softener is used (yes, no) | | `Water_Purification` | Whether water purification is used (yes, no) | | `Water_Purification_Chemical` | Whether chemical water purification is used (1 = yes, 0 = no) | | `Water_Purification_Physical` | Whether physical water purification is used (1 = yes, 0 = no) | | `Water_Purification_UV` | Whether UV water purification is used (1 = yes, 0 = no) | | `Water_Sources_Ground` | Whether ground water is a source (1 = yes, 0 = no) | | `Water_Sources_Surface` | Whether surface water is a source (1 = yes, 0 = no) | | `Water_Sources_Municipal` | Whether municipal water is a source (1 = yes, 0 = no) | | `Water_to_Clean_Milking` | Whether water is used to clean the milking area (yes, no) | | `Cows_Present_While_Clean_Milking` | Whether cows are present while the milking area is cleaned (yes, no) | | `Cow_Holding_Area` | Whether a cow holding area exists (yes, no) | | `Water_to_Clean_Holding` | Whether water is used to clean the holding area (yes, no, No holding area) | | `Cows_Present_While_Clean_Holding` | Whether cows are present while the holding area is cleaned (yes, no) | | `Sprinkler` | Whether sprinklers are used in the holding area (yes, no) | | `Sprinkler_3_days` | Whether sprinklers were used in the 3 days prior to sampling (yes, no) | # Uses The dataset was originally used to fit random forest models predicting 11 bulk tank raw milk quality outcomes from farm management and characteristic variables. It can also be used for research in dairy food safety, agricultural microbiology, farm management optimization, and longitudinal mixed-effects modeling. Use the **"Use this dataset"** button at the top of the page to load the dataset into your preferred library. To load and prepare the data: ```python import pandas as pd from datasets import load_dataset ds = load_dataset("food-ai-nexus/raw-milk-quality-nys-farms") df = ds["train"].to_pandas() ``` # License This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). It is intended for research and educational use. Please cite the associated publication when using this dataset. # Reference ```bibtex @article{shaposhnikov2026sporeformers, title={From sporeformers to sensory: Measures of bulk tank raw milk quality are impacted by dairy farm characteristics and management practices}, author={Shaposhnikov, M.M. and Weachock, R.L. and Wasserlauf-Pepper, Z.D. and Qian, C. and Barbano, D.M. and Martin, N.H.}, journal={Journal of Dairy Science}, year={2026}, note={In Press}, doi={10.3168/jds.2025-27772} } ```
提供机构:
food-ai-nexus
二维码
社区交流群
二维码
科研交流群
商业服务