five

xiannovoa/poultry-weight-dataset

收藏
Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/xiannovoa/poultry-weight-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - image-classification language: - en pretty_name: Poultry Weight Dataset size_categories: - 1K<n<10K --- # Poultry Weight Dataset --- ## DESCRIPTION This dataset is designed for the task of **chicken weight estimation from images** using computer vision techniques. It is built by combining two publicly available datasets: - **Mendeley dataset**: https://data.mendeley.com/datasets/zrs8kk9dvr/1 - **Roboflow dataset**: https://universe.roboflow.com/mohamed-f-abdelshafie-yxuwl/broiler-live-weight-by-semantic-segmentation Since both datasets share authorship and exhibit strong visual similarity, a **deduplication process** was applied to ensure data quality and avoid redundancy. --- ## DATASET CONSTRUCTION To build the final dataset, a custom data processing pipeline was developed. The implementation is available in the project repository: https://github.com/xiannovoa/poultry-vision-monitoring The pipeline (`build_final_weight_dataset.py`) performs the following steps: - Load images and labels from both source datasets - Compute perceptual hashes (pHash) for each image - Detect duplicate or highly similar images based on hash distance - Remove redundant samples - Merge both datasets into a single clean dataset ### Initial data - 1714 images (Mendeley) - 4344 images (Roboflow) **Total:** 6058 images ### After deduplication - 1297 duplicate or highly similar images removed **Final dataset:** - **4761 unique images**, each with an associated weight label --- ## DATA STRUCTURE The dataset is organized as follows: - `images/`: image samples - `labels.csv`: weight annotations associated with each image --- ## DATA DISTRIBUTION The dataset covers a wide range of chicken weights: - **Min weight:** 116 g - **Max weight:** 2093 g - **Mean:** 471 g - **Median:** 371 g - **75th percentile:** 542 g - **Number of unique weight values:** 168 ### Observations - Most samples are concentrated in the **200 g – 600 g range** - Fewer samples exist at higher weight ranges (>1200 g) - The dataset is therefore **imbalanced**, with more representation of early growth stages Despite this, the dataset still covers the full growth range and provides a solid basis for training regression models. --- ## NOTES - This dataset was developed as part of an academic project in collaboration with **Universidade de Santiago de Compostela** and **Balidea** - Duplicate removal was essential to avoid data leakage and overfitting - The dataset may require balancing techniques depending on the model used --- ## ACKNOWLEDGEMENTS We acknowledge the original authors of the Mendeley and Roboflow datasets used as sources for this work.
提供机构:
xiannovoa
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作