summykai/minecraft-skins-captioned-900k
收藏Hugging Face2025-11-30 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/summykai/minecraft-skins-captioned-900k
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype: image
- name: text
dtype: string
- name: hash
dtype: string
splits:
- name: train
num_bytes: 6418369859
num_examples: 854116
license: mit
task_categories:
- text-to-image
- image-to-text
tags:
- minecraft
- pixel-art
- texture-generation
- steve-model
- game-assets
- procedural-generation
- diffusion
- skin-generation
- captioned
- curated
size_categories:
- 100K<n<1M
---
# 🎮 minecraft-skins-captioned-900k
> **854,116 high-quality, captioned Minecraft player skins** — deduplicated, Steve-model only, ready for text-to-image training.

## 📋 Dataset Summary
A **rigorously filtered and quality-controlled** version of [`neurlang/Minecraft-Skins-Captioned-1M`](https://huggingface.co/datasets/neurlang/Minecraft-Skins-Captioned-1M) specifically curated for training high-performance generative models that require precise UV topology constraints.
This dataset is optimized for models like **ST-DiT** (Sparse Template-Aware Diffusion Transformer) that generate Minecraft player skins with guaranteed structural validity.
## 🎯 Key Features
| Feature | Description |
|---------|-------------|
| ✅ **100% Steve Model** | All skins use 4-pixel-wide arm format (Alex removed) |
| ✅ **Quality Filtered** | Garbage, solid-color, and low-effort skins removed |
| ✅ **Deduplicated** | No duplicate images or captions |
| ✅ **Clean Captions** | All captions are meaningful and descriptive |
| ✅ **Production Ready** | Suitable for immediate training use |
## 📊 Statistics

| Metric | Value |
|--------|-------|
| **Original Size** | 1,000,000 samples |
| **Filtered Size** | 854,116 samples |
| **Reduction** | 145,884 samples (14.6%) |
| **Average Caption Length** | 132.5 words |
| **Median Caption Length** | 126 words |
| **Dataset Size** | 6.42 GB |
## 🌈 Content Diversity

The dataset contains diverse Minecraft player skins including:
- **Character types**: Knights, wizards, zombies, astronauts, pirates, robots
- **Themes**: Medieval, fantasy, sci-fi, modern, horror, anime
- **Styles**: Realistic, cartoon, minimalist, detailed, themed
- **Features**: Armor, capes, helmets, accessories, clothing, effects
## 💬 Caption Analysis

## ✨ Quality Curation

## 🚀 Quick Start
```python
from datasets import load_dataset
# Load full dataset
dataset = load_dataset("summykai/minecraft-skins-captioned-900k")
# Load with streaming for large-scale training
dataset = load_dataset("summykai/minecraft-skins-captioned-900k", streaming=True)
# Access samples
for sample in dataset["train"]:
image = sample["image"] # PIL Image (64x64 RGBA)
caption = sample["text"] # Natural language description
break
提供机构:
summykai



