l2533584225/shirakami_fubuki_illust
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/l2533584225/shirakami_fubuki_illust
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
tags:
- anime
- illustration
- vtuber
- hololive
- shirakami fubuki
---
# Shirakami Fubuki Illustrations Dataset
A curated dataset of high-quality illustrations featuring **Shirakami Fubuki** (白上フブキ), a popular virtual YouTuber from Hololive.
## Dataset Description
This dataset contains original artwork and fan art of Shirakami Fubuki, collected from Danbooru, a popular anime-style image board. All images are tagged with comprehensive metadata including artist information, character tags, and copyright details.
### Key Features
- **High-Quality Images**: Original resolution artwork from various artists
- **Rich Metadata**: Each image includes:
- Artist name(s)
- Character tags
- Copyright information
- General descriptive tags
- **Solo Focus**: Primarily features solo illustrations of Shirakami Fubuki
- **Diverse Art Styles**: Multiple artistic interpretations and styles
## Dataset Structure
The dataset is organized by artist, with each artist having their own directory:
```
shirakami_fubuki_illust/
├── artist_name_1/ # Directory named after the artist
│ ├── image_1.png # Artwork by this artist
│ ├── image_2.jpg
│ └── ...
├── artist_name_2/
│ ├── image_1.png
│ └── ...
├── ...
├── metadata.jsonl # JSONL file containing all metadata
└── README.md # This file
```
### Directory Organization
- **Each subdirectory** represents a unique artist from Danbooru
- **Directory name** matches the artist's Danbooru tag name
- **Images** within each directory are original artwork by that artist
- **Total artists**: 800+ unique contributors
- **File naming**: Images retain their original filenames from Danbooru
### Metadata Format
Metadata is stored in `metadata.jsonl` (JSON Lines format), with each line containing:
```json
{
"file_name": "artist_name/image_hash_postid_resolution.jpg",
"text": "shirakami fubuki, 1girl, solo, white hair, fox ears, ..."
}
```
**Fields:**
- **`file_name`**: Relative path to the image file (artist directory + filename)
- Format: `{artist}/{hash}_{post_id}_{width}x{height}.{ext}`
- The artist directory name matches the first artist tag
- **`text`**: Comma-separated tags describing the image
- Character tags: `shirakami fubuki`, `shirakami fubuki (1st costume)`, etc.
- General tags: `1girl`, `solo`, `white hair`, `fox ears`, `blush`, etc.
- Artist information is encoded in the directory structure
**Example:**
```json
{
"file_name": "hyde__tabakko_/b64e93b423d2794a4ba141740cbd2077_11192398_3840x2160.jpg",
"text": "shirakami fubuki, sukonbu (shirakami fubuki), 1girl, animal ears, fox ears, fox girl, solo, white hair"
}
```
This corresponds to:
- **Artist**: `hyde__tabakko_` (directory name)
- **Image**: `b64e93b423d2794a4ba141740cbd2077_11192398_3840x2160.jpg`
- **Post ID**: 11192398
- **Resolution**: 3840x2160
- **Tags**: shirakami fubuki, sukonbu, 1girl, animal ears, fox ears, fox girl, solo, white hair
## Usage
### Loading the Dataset
```python
import json
from PIL import Image
import os
from pathlib import Path
# Load metadata (JSONL format)
metadata = []
with open('metadata.jsonl', 'r', encoding='utf-8') as f:
for line in f:
if line.strip():
metadata.append(json.loads(line))
print(f"Loaded {len(metadata)} images")
# Access images by artist
artist_name = "hyde__tabakko_"
artist_dir = Path(artist_name)
if artist_dir.exists():
# List all images by this artist
image_files = list(artist_dir.glob("*.png")) + \
list(artist_dir.glob("*.jpg")) + \
list(artist_dir.glob("*.webp"))
print(f"Found {len(image_files)} images by {artist_name}")
# Open and process images
for img_path in image_files[:5]: # First 5 images
img = Image.open(img_path)
print(f" {img_path.name}: {img.size}")
```
### Filtering by Tags
```python
# Load all metadata
with open('metadata.jsonl', 'r', encoding='utf-8') as f:
posts = [json.loads(line) for line in f if line.strip()]
print(f"Total images: {len(posts)}")
# Filter by specific tags
def has_tag(post, tag):
"""Check if a post contains a specific tag"""
tags = [t.strip() for t in post.get('text', '').split(',')]
return tag in tags
# Get solo illustrations
solo_posts = [post for post in posts if has_tag(post, 'solo')]
print(f"Solo illustrations: {len(solo_posts)}")
# Get images with fox ears
fox_ear_posts = [post for post in posts if has_tag(post, 'fox ears')]
print(f"Images with fox ears: {len(fox_ear_posts)}")
# Get images by a specific artist
artist = "hyde__tabakko_"
artist_posts = [
post for post in posts
if post.get('file_name', '').startswith(f"{artist}/")
]
print(f"Images by {artist}: {len(artist_posts)}")
# Browse by directory structure
artists = [d.name for d in Path('.').iterdir()
if d.is_dir() and not d.name.startswith('.')]
print(f"\nTotal artists: {len(artists)}")
# Get top artists by work count
artist_counts = [
(artist, len(list(Path(artist).glob('*'))))
for artist in artists
]
artist_counts.sort(key=lambda x: x[1], reverse=True)
print("\nTop 10 artists by work count:")
for artist, count in artist_counts[:10]:
print(f" {artist}: {count} images")
# Find images with multiple specific tags
def has_all_tags(post, tags):
"""Check if a post contains all specified tags"""
post_tags = [t.strip() for t in post.get('text', '').split(',')]
return all(tag in post_tags for tag in tags)
# Example: Solo + white hair + fox ears
filtered = [
post for post in posts
if has_all_tags(post, ['solo', 'white hair', 'fox ears'])
]
print(f"\nSolo + white hair + fox ears: {len(filtered)} images")
```
## Data Collection
The dataset was collected using an automated crawler with the following specifications:
- **Source**: Danbooru (danbooru.donmai.us)
- **Search Tags**: `shirakami_fubuki + solo`
- **Collection Method**: Playwright-based web scraping with HTTP/2 support
- **Quality Control**: Only original/high-resolution images included
- **Total Pages Crawled**: 866 pages
- **Retry Mechanism**: Failed downloads automatically retried
## Statistics
- **Total Posts**: ~8,500+ illustrations
- **Unique Artists**: 800+ contributing artists
- **Image Formats**: PNG, JPG, WEBP
- **Organization**: Artist-based directory structure
- **Metadata Format**: JSONL (JSON Lines)
- **License**: Apache 2.0 (dataset structure and metadata)
## Ethical Considerations
### Attribution
All artwork in this dataset is created by various artists. When using these images:
1. **Credit the Artists**: Always attribute the original artists when possible
2. **Respect Copyright**: Individual artworks may have different licenses
3. **Non-Commercial Use**: Consider contacting artists for commercial usage
4. **Artist Preferences**: Some artists may not want their work used for AI training
### Content Notes
- This dataset focuses on safe-for-work (SFW) content
- All images are tagged according to Danbooru's tagging system
- Users should review content before use in their projects
## Citation
If you use this dataset in your research or project, please cite:
```bibtex
@dataset{shirakami_fubuki_illust,
author = {Various Artists},
title = {Shirakami Fubuki Illustrations Dataset},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/l2533584225/shirakami_fubuki_illust}
}
```
## Related Resources
- [Shirakami Fubuki Official Channel](https://www.youtube.com/@ShirakamiFubuki)
- [Hololive Production](https://hololive.hololivepro.com/)
- [Danbooru](https://danbooru.donmai.us/)
## License
- **Dataset Structure & Metadata**: Apache 2.0
- **Individual Artworks**: Subject to respective artists' terms
- Please review each image's source and artist preferences before use
## Contact
For questions, issues, or takedown requests, please open an issue on the repository.
---
**Note**: This dataset is intended for research, education, and personal projects. Always respect artists' rights and preferences.
提供机构:
l2533584225



