five

summykai/minecraft-skins-captioned-900k

收藏
Hugging Face2025-11-30 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/summykai/minecraft-skins-captioned-900k
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: image dtype: image - name: text dtype: string - name: hash dtype: string splits: - name: train num_bytes: 6418369859 num_examples: 854116 license: mit task_categories: - text-to-image - image-to-text tags: - minecraft - pixel-art - texture-generation - steve-model - game-assets - procedural-generation - diffusion - skin-generation - captioned - curated size_categories: - 100K<n<1M --- # 🎮 minecraft-skins-captioned-900k > **854,116 high-quality, captioned Minecraft player skins** — deduplicated, Steve-model only, ready for text-to-image training. ![Dataset Preview](assets/hero_grid.png) ## 📋 Dataset Summary A **rigorously filtered and quality-controlled** version of [`neurlang/Minecraft-Skins-Captioned-1M`](https://huggingface.co/datasets/neurlang/Minecraft-Skins-Captioned-1M) specifically curated for training high-performance generative models that require precise UV topology constraints. This dataset is optimized for models like **ST-DiT** (Sparse Template-Aware Diffusion Transformer) that generate Minecraft player skins with guaranteed structural validity. ## 🎯 Key Features | Feature | Description | |---------|-------------| | ✅ **100% Steve Model** | All skins use 4-pixel-wide arm format (Alex removed) | | ✅ **Quality Filtered** | Garbage, solid-color, and low-effort skins removed | | ✅ **Deduplicated** | No duplicate images or captions | | ✅ **Clean Captions** | All captions are meaningful and descriptive | | ✅ **Production Ready** | Suitable for immediate training use | ## 📊 Statistics ![Statistics](assets/statistics.png) | Metric | Value | |--------|-------| | **Original Size** | 1,000,000 samples | | **Filtered Size** | 854,116 samples | | **Reduction** | 145,884 samples (14.6%) | | **Average Caption Length** | 132.5 words | | **Median Caption Length** | 126 words | | **Dataset Size** | 6.42 GB | ## 🌈 Content Diversity ![Diversity Showcase](assets/diversity_showcase.png) The dataset contains diverse Minecraft player skins including: - **Character types**: Knights, wizards, zombies, astronauts, pirates, robots - **Themes**: Medieval, fantasy, sci-fi, modern, horror, anime - **Styles**: Realistic, cartoon, minimalist, detailed, themed - **Features**: Armor, capes, helmets, accessories, clothing, effects ## 💬 Caption Analysis ![Word Cloud](assets/wordcloud.png) ## ✨ Quality Curation ![Before After](assets/before_after.png) ## 🚀 Quick Start ```python from datasets import load_dataset # Load full dataset dataset = load_dataset("summykai/minecraft-skins-captioned-900k") # Load with streaming for large-scale training dataset = load_dataset("summykai/minecraft-skins-captioned-900k", streaming=True) # Access samples for sample in dataset["train"]: image = sample["image"] # PIL Image (64x64 RGBA) caption = sample["text"] # Natural language description break
提供机构:
summykai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作