xiannovoa/poultry-weight-dataset
收藏Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/xiannovoa/poultry-weight-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- image-classification
language:
- en
pretty_name: Poultry Weight Dataset
size_categories:
- 1K<n<10K
---
# Poultry Weight Dataset
---
## DESCRIPTION
This dataset is designed for the task of **chicken weight estimation from images** using computer vision techniques.
It is built by combining two publicly available datasets:
- **Mendeley dataset**: https://data.mendeley.com/datasets/zrs8kk9dvr/1
- **Roboflow dataset**: https://universe.roboflow.com/mohamed-f-abdelshafie-yxuwl/broiler-live-weight-by-semantic-segmentation
Since both datasets share authorship and exhibit strong visual similarity, a **deduplication process** was applied to ensure data quality and avoid redundancy.
---
## DATASET CONSTRUCTION
To build the final dataset, a custom data processing pipeline was developed. The implementation is available in the project repository:
https://github.com/xiannovoa/poultry-vision-monitoring
The pipeline (`build_final_weight_dataset.py`) performs the following steps:
- Load images and labels from both source datasets
- Compute perceptual hashes (pHash) for each image
- Detect duplicate or highly similar images based on hash distance
- Remove redundant samples
- Merge both datasets into a single clean dataset
### Initial data
- 1714 images (Mendeley)
- 4344 images (Roboflow)
**Total:** 6058 images
### After deduplication
- 1297 duplicate or highly similar images removed
**Final dataset:**
- **4761 unique images**, each with an associated weight label
---
## DATA STRUCTURE
The dataset is organized as follows:
- `images/`: image samples
- `labels.csv`: weight annotations associated with each image
---
## DATA DISTRIBUTION
The dataset covers a wide range of chicken weights:
- **Min weight:** 116 g
- **Max weight:** 2093 g
- **Mean:** 471 g
- **Median:** 371 g
- **75th percentile:** 542 g
- **Number of unique weight values:** 168
### Observations
- Most samples are concentrated in the **200 g – 600 g range**
- Fewer samples exist at higher weight ranges (>1200 g)
- The dataset is therefore **imbalanced**, with more representation of early growth stages
Despite this, the dataset still covers the full growth range and provides a solid basis for training regression models.
---
## NOTES
- This dataset was developed as part of an academic project in collaboration with **Universidade de Santiago de Compostela** and **Balidea**
- Duplicate removal was essential to avoid data leakage and overfitting
- The dataset may require balancing techniques depending on the model used
---
## ACKNOWLEDGEMENTS
We acknowledge the original authors of the Mendeley and Roboflow datasets used as sources for this work.
提供机构:
xiannovoa



