xiannovoa/poultry-weight-dataset

Name: xiannovoa/poultry-weight-dataset
Creator: xiannovoa
Published: 2026-03-20 11:00:00
License: 暂无描述

Hugging Face2026-03-20 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/xiannovoa/poultry-weight-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - image-classification language: - en pretty_name: Poultry Weight Dataset size_categories: - 1K<n<10K --- # Poultry Weight Dataset --- ## DESCRIPTION This dataset is designed for the task of **chicken weight estimation from images** using computer vision techniques. It is built by combining two publicly available datasets: - **Mendeley dataset**: https://data.mendeley.com/datasets/zrs8kk9dvr/1 - **Roboflow dataset**: https://universe.roboflow.com/mohamed-f-abdelshafie-yxuwl/broiler-live-weight-by-semantic-segmentation Since both datasets share authorship and exhibit strong visual similarity, a **deduplication process** was applied to ensure data quality and avoid redundancy. --- ## DATASET CONSTRUCTION To build the final dataset, a custom data processing pipeline was developed. The implementation is available in the project repository: https://github.com/xiannovoa/poultry-vision-monitoring The pipeline (`build_final_weight_dataset.py`) performs the following steps: - Load images and labels from both source datasets - Compute perceptual hashes (pHash) for each image - Detect duplicate or highly similar images based on hash distance - Remove redundant samples - Merge both datasets into a single clean dataset ### Initial data - 1714 images (Mendeley) - 4344 images (Roboflow) **Total:** 6058 images ### After deduplication - 1297 duplicate or highly similar images removed **Final dataset:** - **4761 unique images**, each with an associated weight label --- ## DATA STRUCTURE The dataset is organized as follows: - `images/`: image samples - `labels.csv`: weight annotations associated with each image --- ## DATA DISTRIBUTION The dataset covers a wide range of chicken weights: - **Min weight:** 116 g - **Max weight:** 2093 g - **Mean:** 471 g - **Median:** 371 g - **75th percentile:** 542 g - **Number of unique weight values:** 168 ### Observations - Most samples are concentrated in the **200 g – 600 g range** - Fewer samples exist at higher weight ranges (>1200 g) - The dataset is therefore **imbalanced**, with more representation of early growth stages Despite this, the dataset still covers the full growth range and provides a solid basis for training regression models. --- ## NOTES - This dataset was developed as part of an academic project in collaboration with **Universidade de Santiago de Compostela** and **Balidea** - Duplicate removal was essential to avoid data leakage and overfitting - The dataset may require balancing techniques depending on the model used --- ## ACKNOWLEDGEMENTS We acknowledge the original authors of the Mendeley and Roboflow datasets used as sources for this work.

提供机构：

xiannovoa

5,000+

优质数据集

54 个

任务类型

进入经典数据集