Prentz/Unified-Animals-Dataset
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Prentz/Unified-Animals-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
tags:
- wildlife
- Animals
- image
- classification
pretty_name: Unified Animals Dataset
size_categories:
- 1K<n<10K
---
# Unified Animal Dataset
This dataset contains a large-scale collection of animal images designed for multi-class image classification tasks. It includes **57,329 images across 216 animal categories** and is suitable for training, evaluation, and benchmarking of computer vision models.
---
## Dataset Details
### Dataset Description
The Unified Animal Dataset is a curated multi-source dataset combining several publicly available animal image datasets into a unified classification benchmark. It spans a wide range of animal types including mammals, birds, insects, and marine species.
The dataset is pre-structured into training, validation, and test splits, making it directly usable for machine learning workflows.
* **Curated by:** Prentz
* **Shared by:** Prentz
* **License:** Apache 2.0
---
### Dataset Sources
The dataset is composed from multiple sources including:
* Stanford Dogs dataset
* Ultralytics datasets
* Additional animal datasets sourced from Kaggle
---
## Uses
### Direct Use
This dataset is intended for:
* Multi-class animal image classification
* Training and fine-tuning deep learning models
* Transfer learning experiments
* Benchmarking computer vision architectures
* Educational purposes in machine learning
---
### Out-of-Scope Use
This dataset is not suitable for:
* Human-related tasks (e.g., facial recognition, demographic prediction)
* Object detection or segmentation tasks (no bounding box annotations)
* Applications requiring balanced datasets without preprocessing
---
## Dataset Structure
The dataset follows a standard image classification folder structure:
```
data/
train/
val/
test/
class_1/
class_2/
...
```
### Statistics
* **Total Images:** 57,329
* **Total Classes:** 216
* **Dataset Size:** 4.14 GB
### Split Distribution
| Split | Images | Percentage |
| ---------- | ------ | ---------- |
| Train | 28,637 | 50.0% |
| Validation | 11,415 | 19.9% |
| Test | 17,277 | 30.1% |
### Class Distribution
* Average images per class: ~265
* Most represented class: ~4,800+ images
* Least represented classes: ~60 images
* Imbalance ratio: ~81×
---
## Dataset Creation
### Curation Rationale
The dataset was created to provide a large and diverse benchmark for animal classification tasks, combining multiple datasets into a single unified structure. The goal was to enable efficient experimentation and model development without requiring users to manually merge datasets.
---
### Source Data
#### Data Collection and Processing
Data was collected from multiple publicly available datasets and organized into a unified structure. The processing pipeline included:
* Merging datasets into a consistent directory format
* Cleaning corrupted or unreadable images
* Ensuring class consistency across splits
* Splitting into train, validation, and test sets
#### Source Data Producers
The original data was created by multiple dataset providers, including academic datasets and open-source contributors.
---
### Annotations
#### Annotation Process
Annotations are provided as **class labels via directory structure**. Each image belongs to exactly one class, determined by its folder.
#### Annotators
Annotations originate from the original datasets used (e.g., Stanford Dogs and others). No additional manual relabeling was performed beyond dataset organization.
#### Personal and Sensitive Information
The dataset does not contain personal or sensitive human data. It consists solely of animal images.
---
## Bias, Risks, and Limitations
* **Class imbalance:** Significant imbalance (~81×) between largest and smallest classes
* **Resolution variability:** Images vary widely in size and quality
* **Dataset bias:** Over-representation of certain animals (e.g., dogs, spiders)
* **Domain limitation:** Performance may degrade on out-of-distribution images
---
### Recommendations
Users should consider:
* Applying class weighting or focal loss
* Using data augmentation techniques
* Standardizing image resolution (e.g., 224×224)
* Evaluating per-class performance, not just overall accuracy
---
## Citation
**BibTeX:**
```
@dataset{Unified_animals_2026,
title={Unified Animal Dataset (216 Classes)},
author={Prentz},
year={2026}
}
```
**APA:**
Furly. (2026). *Unified Animal Dataset (216 Classes).*
---
## More Information
This dataset is intended for practical machine learning workflows and experimentation. Users should carefully handle preprocessing and class imbalance when training models.
---
## Dataset Card Authors
Prentz (author)
---
## Dataset Card Contact
none
提供机构:
Prentz



