TheKernel01/Tiny-GenImage
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/TheKernel01/Tiny-GenImage
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype: image
- name: label
dtype:
class_label:
names:
'0': real
'1': fake
- name: generator
dtype:
class_label:
names:
'0': Real
'1': ADM
'2': BigGAN
'3': GLIDE
'4': Midjourney
'5': SD14
'6': SD15
'7': VQDM
'8': Wukong
splits:
- name: train
num_bytes: 6558732103
num_examples: 28000
- name: validation
num_bytes: 1748767328
num_examples: 7000
download_size: 8359198723
dataset_size: 8307499431
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
license: cc-by-nc-sa-4.0
task_categories:
- image-classification
language:
- en
---
# Tiny GenImage Dataset
## 📝 Dataset Description
### Dataset Summary
The Tiny GenImage Dataset is a curated, scaled-down collection of images and associated metadata designed to train, validate, and benchmark models for detecting and identifying artificially generated content. The dataset contains a mix of real-world images alongside those generated by prominent AI models, including various diffusion models (like Stable Diffusion 1.4/1.5, GLIDE, Midjourney, ADM, VQDM, Wukong) and GANs (BigGAN).
Each image is labeled under two categories, enabling researchers and developers to tackle two distinct, high-value computer vision tasks: binary real/fake classification and multi-class source model identification.
### Supported Tasks and Leaderboards
This dataset directly supports two critical image classification tasks:
|**Task ID**|**Task Name**|**Description**|**Output Classes**|
|---|---|---|---|
|**Task A**|Binary Veracity Classification|Classifying images as either real or fake.|2 (real, fake)|
|**Task B**|AI Model Source Identification|Identifying the specific AI generation model used for images labeled as AI-Generated.|9 (Real, ADM, BigGAN, GLIDE, Midjourney, SD14, SD15, VQDM, Wukong)|
### Languages
The descriptive text, including all class labels and metadata, is in English (en).
## 🗂️ Data Splits
The dataset is divided into training and validation splits to facilitate standard machine learning workflows.
|**Split**|**Number of Instances**|**Notes**|
|---|---|---|
|**train**|28,000|Used for model training and weight optimization.|
|**validation**|7,000|Used for hyperparameter tuning and intermediate model evaluation.|
## 💾 Dataset Structure
### Data Instances
A single data instance consists of an image file and two distinct labels detailing its source and authenticity.
|**Field Name**|**Example Value**|**Description**|
|---|---|---|
|**image**|`<PIL.Image.Image object>`|The actual image content loaded into a PIL object.|
|**label**|`1`|Binary label for authenticity (Real vs. AI-Generated).|
|**generator**|`4`|Multi-class label for the specific generation model (or Real).|
### Data Fields
The dataset contains the following fields:
|**Field Name**|**Data Type**|**Description**|
|---|---|---|
|**image**|`datasets.Image()`|The actual image content (e.g., .jpg, .png).|
|**label**|`datasets.ClassLabel`|Task A: Binary label for image veracity.|
|**generator**|`datasets.ClassLabel`|Task B: Label specifying the generation source/model.|
## 🏷️ Label Definitions
The two label fields use the following strict mappings:
**`label` (Binary Veracity Classification)**
|**Label**|**Value**|**Description**|
|---|---|---|
|**real**|`0`|Image is a real photograph/non-AI generated.|
|**fake**|`1`|Image was created by an AI generation model.|
**`generator` (Model Source Identification)**
|**Label**|**Value**|**Description**|
|---|---|---|
|**Real**|`0`|Real image (no AI generation involved).|
|**ADM**|`1`|Generated by Ablated Diffusion Model (Guided Diffusion).|
|**BigGAN**|`2`|Generated by BigGAN.|
|**GLIDE**|`3`|Generated by GLIDE.|
|**Midjourney**|`4`|Generated by Midjourney.|
|**SD14**|`5`|Generated by Stable Diffusion 1.4.|
|**SD15**|`6`|Generated by Stable Diffusion 1.5.|
|**VQDM**|`7`|Generated by Vector Quantized Diffusion Model.|
|**Wukong**|`8`|Generated by the Wukong diffusion model.|
## 🔗 Sources
- **Original dataset:** [yangsangtai/tiny-genimage (Kaggle)](https://www.kaggle.com/datasets/yangsangtai/tiny-genimage)
提供机构:
TheKernel01



