five

Prentz/Unified-Animals-Dataset

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Prentz/Unified-Animals-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en tags: - wildlife - Animals - image - classification pretty_name: Unified Animals Dataset size_categories: - 1K<n<10K --- # Unified Animal Dataset This dataset contains a large-scale collection of animal images designed for multi-class image classification tasks. It includes **57,329 images across 216 animal categories** and is suitable for training, evaluation, and benchmarking of computer vision models. --- ## Dataset Details ### Dataset Description The Unified Animal Dataset is a curated multi-source dataset combining several publicly available animal image datasets into a unified classification benchmark. It spans a wide range of animal types including mammals, birds, insects, and marine species. The dataset is pre-structured into training, validation, and test splits, making it directly usable for machine learning workflows. * **Curated by:** Prentz * **Shared by:** Prentz * **License:** Apache 2.0 --- ### Dataset Sources The dataset is composed from multiple sources including: * Stanford Dogs dataset * Ultralytics datasets * Additional animal datasets sourced from Kaggle --- ## Uses ### Direct Use This dataset is intended for: * Multi-class animal image classification * Training and fine-tuning deep learning models * Transfer learning experiments * Benchmarking computer vision architectures * Educational purposes in machine learning --- ### Out-of-Scope Use This dataset is not suitable for: * Human-related tasks (e.g., facial recognition, demographic prediction) * Object detection or segmentation tasks (no bounding box annotations) * Applications requiring balanced datasets without preprocessing --- ## Dataset Structure The dataset follows a standard image classification folder structure: ``` data/ train/ val/ test/ class_1/ class_2/ ... ``` ### Statistics * **Total Images:** 57,329 * **Total Classes:** 216 * **Dataset Size:** 4.14 GB ### Split Distribution | Split | Images | Percentage | | ---------- | ------ | ---------- | | Train | 28,637 | 50.0% | | Validation | 11,415 | 19.9% | | Test | 17,277 | 30.1% | ### Class Distribution * Average images per class: ~265 * Most represented class: ~4,800+ images * Least represented classes: ~60 images * Imbalance ratio: ~81× --- ## Dataset Creation ### Curation Rationale The dataset was created to provide a large and diverse benchmark for animal classification tasks, combining multiple datasets into a single unified structure. The goal was to enable efficient experimentation and model development without requiring users to manually merge datasets. --- ### Source Data #### Data Collection and Processing Data was collected from multiple publicly available datasets and organized into a unified structure. The processing pipeline included: * Merging datasets into a consistent directory format * Cleaning corrupted or unreadable images * Ensuring class consistency across splits * Splitting into train, validation, and test sets #### Source Data Producers The original data was created by multiple dataset providers, including academic datasets and open-source contributors. --- ### Annotations #### Annotation Process Annotations are provided as **class labels via directory structure**. Each image belongs to exactly one class, determined by its folder. #### Annotators Annotations originate from the original datasets used (e.g., Stanford Dogs and others). No additional manual relabeling was performed beyond dataset organization. #### Personal and Sensitive Information The dataset does not contain personal or sensitive human data. It consists solely of animal images. --- ## Bias, Risks, and Limitations * **Class imbalance:** Significant imbalance (~81×) between largest and smallest classes * **Resolution variability:** Images vary widely in size and quality * **Dataset bias:** Over-representation of certain animals (e.g., dogs, spiders) * **Domain limitation:** Performance may degrade on out-of-distribution images --- ### Recommendations Users should consider: * Applying class weighting or focal loss * Using data augmentation techniques * Standardizing image resolution (e.g., 224×224) * Evaluating per-class performance, not just overall accuracy --- ## Citation **BibTeX:** ``` @dataset{Unified_animals_2026, title={Unified Animal Dataset (216 Classes)}, author={Prentz}, year={2026} } ``` **APA:** Furly. (2026). *Unified Animal Dataset (216 Classes).* --- ## More Information This dataset is intended for practical machine learning workflows and experimentation. Users should carefully handle preprocessing and class imbalance when training models. --- ## Dataset Card Authors Prentz (author) --- ## Dataset Card Contact none
提供机构:
Prentz
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作