five

l2533584225/shirakami_fubuki_illust

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/l2533584225/shirakami_fubuki_illust
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 tags: - anime - illustration - vtuber - hololive - shirakami fubuki --- # Shirakami Fubuki Illustrations Dataset A curated dataset of high-quality illustrations featuring **Shirakami Fubuki** (白上フブキ), a popular virtual YouTuber from Hololive. ## Dataset Description This dataset contains original artwork and fan art of Shirakami Fubuki, collected from Danbooru, a popular anime-style image board. All images are tagged with comprehensive metadata including artist information, character tags, and copyright details. ### Key Features - **High-Quality Images**: Original resolution artwork from various artists - **Rich Metadata**: Each image includes: - Artist name(s) - Character tags - Copyright information - General descriptive tags - **Solo Focus**: Primarily features solo illustrations of Shirakami Fubuki - **Diverse Art Styles**: Multiple artistic interpretations and styles ## Dataset Structure The dataset is organized by artist, with each artist having their own directory: ``` shirakami_fubuki_illust/ ├── artist_name_1/ # Directory named after the artist │ ├── image_1.png # Artwork by this artist │ ├── image_2.jpg │ └── ... ├── artist_name_2/ │ ├── image_1.png │ └── ... ├── ... ├── metadata.jsonl # JSONL file containing all metadata └── README.md # This file ``` ### Directory Organization - **Each subdirectory** represents a unique artist from Danbooru - **Directory name** matches the artist's Danbooru tag name - **Images** within each directory are original artwork by that artist - **Total artists**: 800+ unique contributors - **File naming**: Images retain their original filenames from Danbooru ### Metadata Format Metadata is stored in `metadata.jsonl` (JSON Lines format), with each line containing: ```json { "file_name": "artist_name/image_hash_postid_resolution.jpg", "text": "shirakami fubuki, 1girl, solo, white hair, fox ears, ..." } ``` **Fields:** - **`file_name`**: Relative path to the image file (artist directory + filename) - Format: `{artist}/{hash}_{post_id}_{width}x{height}.{ext}` - The artist directory name matches the first artist tag - **`text`**: Comma-separated tags describing the image - Character tags: `shirakami fubuki`, `shirakami fubuki (1st costume)`, etc. - General tags: `1girl`, `solo`, `white hair`, `fox ears`, `blush`, etc. - Artist information is encoded in the directory structure **Example:** ```json { "file_name": "hyde__tabakko_/b64e93b423d2794a4ba141740cbd2077_11192398_3840x2160.jpg", "text": "shirakami fubuki, sukonbu (shirakami fubuki), 1girl, animal ears, fox ears, fox girl, solo, white hair" } ``` This corresponds to: - **Artist**: `hyde__tabakko_` (directory name) - **Image**: `b64e93b423d2794a4ba141740cbd2077_11192398_3840x2160.jpg` - **Post ID**: 11192398 - **Resolution**: 3840x2160 - **Tags**: shirakami fubuki, sukonbu, 1girl, animal ears, fox ears, fox girl, solo, white hair ## Usage ### Loading the Dataset ```python import json from PIL import Image import os from pathlib import Path # Load metadata (JSONL format) metadata = [] with open('metadata.jsonl', 'r', encoding='utf-8') as f: for line in f: if line.strip(): metadata.append(json.loads(line)) print(f"Loaded {len(metadata)} images") # Access images by artist artist_name = "hyde__tabakko_" artist_dir = Path(artist_name) if artist_dir.exists(): # List all images by this artist image_files = list(artist_dir.glob("*.png")) + \ list(artist_dir.glob("*.jpg")) + \ list(artist_dir.glob("*.webp")) print(f"Found {len(image_files)} images by {artist_name}") # Open and process images for img_path in image_files[:5]: # First 5 images img = Image.open(img_path) print(f" {img_path.name}: {img.size}") ``` ### Filtering by Tags ```python # Load all metadata with open('metadata.jsonl', 'r', encoding='utf-8') as f: posts = [json.loads(line) for line in f if line.strip()] print(f"Total images: {len(posts)}") # Filter by specific tags def has_tag(post, tag): """Check if a post contains a specific tag""" tags = [t.strip() for t in post.get('text', '').split(',')] return tag in tags # Get solo illustrations solo_posts = [post for post in posts if has_tag(post, 'solo')] print(f"Solo illustrations: {len(solo_posts)}") # Get images with fox ears fox_ear_posts = [post for post in posts if has_tag(post, 'fox ears')] print(f"Images with fox ears: {len(fox_ear_posts)}") # Get images by a specific artist artist = "hyde__tabakko_" artist_posts = [ post for post in posts if post.get('file_name', '').startswith(f"{artist}/") ] print(f"Images by {artist}: {len(artist_posts)}") # Browse by directory structure artists = [d.name for d in Path('.').iterdir() if d.is_dir() and not d.name.startswith('.')] print(f"\nTotal artists: {len(artists)}") # Get top artists by work count artist_counts = [ (artist, len(list(Path(artist).glob('*')))) for artist in artists ] artist_counts.sort(key=lambda x: x[1], reverse=True) print("\nTop 10 artists by work count:") for artist, count in artist_counts[:10]: print(f" {artist}: {count} images") # Find images with multiple specific tags def has_all_tags(post, tags): """Check if a post contains all specified tags""" post_tags = [t.strip() for t in post.get('text', '').split(',')] return all(tag in post_tags for tag in tags) # Example: Solo + white hair + fox ears filtered = [ post for post in posts if has_all_tags(post, ['solo', 'white hair', 'fox ears']) ] print(f"\nSolo + white hair + fox ears: {len(filtered)} images") ``` ## Data Collection The dataset was collected using an automated crawler with the following specifications: - **Source**: Danbooru (danbooru.donmai.us) - **Search Tags**: `shirakami_fubuki + solo` - **Collection Method**: Playwright-based web scraping with HTTP/2 support - **Quality Control**: Only original/high-resolution images included - **Total Pages Crawled**: 866 pages - **Retry Mechanism**: Failed downloads automatically retried ## Statistics - **Total Posts**: ~8,500+ illustrations - **Unique Artists**: 800+ contributing artists - **Image Formats**: PNG, JPG, WEBP - **Organization**: Artist-based directory structure - **Metadata Format**: JSONL (JSON Lines) - **License**: Apache 2.0 (dataset structure and metadata) ## Ethical Considerations ### Attribution All artwork in this dataset is created by various artists. When using these images: 1. **Credit the Artists**: Always attribute the original artists when possible 2. **Respect Copyright**: Individual artworks may have different licenses 3. **Non-Commercial Use**: Consider contacting artists for commercial usage 4. **Artist Preferences**: Some artists may not want their work used for AI training ### Content Notes - This dataset focuses on safe-for-work (SFW) content - All images are tagged according to Danbooru's tagging system - Users should review content before use in their projects ## Citation If you use this dataset in your research or project, please cite: ```bibtex @dataset{shirakami_fubuki_illust, author = {Various Artists}, title = {Shirakami Fubuki Illustrations Dataset}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/l2533584225/shirakami_fubuki_illust} } ``` ## Related Resources - [Shirakami Fubuki Official Channel](https://www.youtube.com/@ShirakamiFubuki) - [Hololive Production](https://hololive.hololivepro.com/) - [Danbooru](https://danbooru.donmai.us/) ## License - **Dataset Structure & Metadata**: Apache 2.0 - **Individual Artworks**: Subject to respective artists' terms - Please review each image's source and artist preferences before use ## Contact For questions, issues, or takedown requests, please open an issue on the repository. --- **Note**: This dataset is intended for research, education, and personal projects. Always respect artists' rights and preferences.
提供机构:
l2533584225
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作