CyberHarem/shirase_sakuya_theidolmstershinycolors
收藏Hugging Face2024-01-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/CyberHarem/shirase_sakuya_theidolmstershinycolors
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-to-image
tags:
- art
- not-for-all-audiences
size_categories:
- n<1K
---
# Dataset of shirase_sakuya/白瀬咲耶 (THE iDOLM@STER: SHINY COLORS)
This is the dataset of shirase_sakuya/白瀬咲耶 (THE iDOLM@STER: SHINY COLORS), containing 500 images and their tags.
The core tags of this character are `long_hair, black_hair, yellow_eyes, breasts, bangs, hair_between_eyes, large_breasts, ponytail, high_ponytail`, which are pruned in this dataset.
Images are crawled from many sites (e.g. danbooru, pixiv, zerochan ...), the auto-crawling system is powered by [DeepGHS Team](https://github.com/deepghs)([huggingface organization](https://huggingface.co/deepghs)).
## List of Packages
| Name | Images | Size | Download | Type | Description |
|:-----------------|---------:|:-----------|:----------------------------------------------------------------------------------------------------------------------------------------|:-----------|:---------------------------------------------------------------------|
| raw | 500 | 796.73 MiB | [Download](https://huggingface.co/datasets/CyberHarem/shirase_sakuya_theidolmstershinycolors/resolve/main/dataset-raw.zip) | Waifuc-Raw | Raw data with meta information (min edge aligned to 1400 if larger). |
| 800 | 500 | 409.94 MiB | [Download](https://huggingface.co/datasets/CyberHarem/shirase_sakuya_theidolmstershinycolors/resolve/main/dataset-800.zip) | IMG+TXT | dataset with the shorter side not exceeding 800 pixels. |
| stage3-p480-800 | 1220 | 874.63 MiB | [Download](https://huggingface.co/datasets/CyberHarem/shirase_sakuya_theidolmstershinycolors/resolve/main/dataset-stage3-p480-800.zip) | IMG+TXT | 3-stage cropped dataset with the area not less than 480x480 pixels. |
| 1200 | 500 | 683.72 MiB | [Download](https://huggingface.co/datasets/CyberHarem/shirase_sakuya_theidolmstershinycolors/resolve/main/dataset-1200.zip) | IMG+TXT | dataset with the shorter side not exceeding 1200 pixels. |
| stage3-p480-1200 | 1220 | 1.30 GiB | [Download](https://huggingface.co/datasets/CyberHarem/shirase_sakuya_theidolmstershinycolors/resolve/main/dataset-stage3-p480-1200.zip) | IMG+TXT | 3-stage cropped dataset with the area not less than 480x480 pixels. |
### Load Raw Dataset with Waifuc
We provide raw dataset (including tagged images) for [waifuc](https://deepghs.github.io/waifuc/main/tutorials/installation/index.html) loading. If you need this, just run the following code
```python
import os
import zipfile
from huggingface_hub import hf_hub_download
from waifuc.source import LocalSource
# download raw archive file
zip_file = hf_hub_download(
repo_id='CyberHarem/shirase_sakuya_theidolmstershinycolors',
repo_type='dataset',
filename='dataset-raw.zip',
)
# extract files to your directory
dataset_dir = 'dataset_dir'
os.makedirs(dataset_dir, exist_ok=True)
with zipfile.ZipFile(zip_file, 'r') as zf:
zf.extractall(dataset_dir)
# load the dataset with waifuc
source = LocalSource(dataset_dir)
for item in source:
print(item.image, item.meta['filename'], item.meta['tags'])
```
## List of Clusters
List of tag clustering result, maybe some outfits can be mined here.
### Raw Text Version
| # | Samples | Img-1 | Img-2 | Img-3 | Img-4 | Img-5 | Tags |
|----:|----------:|:--------------------------------|:--------------------------------|:--------------------------------|:--------------------------------|:--------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | 58 |  |  |  |  |  | white_shirt, 1girl, school_uniform, solo, looking_at_viewer, cleavage, pleated_skirt, smile, blush, plaid_skirt, collarbone, dog_tags, green_necktie, dress_shirt, white_background, necklace, simple_background |
| 1 | 6 |  |  |  |  |  | 1boy, 1girl, blush, dress_shirt, female_pubic_hair, hetero, loose_necktie, plaid_skirt, pleated_skirt, pussy, school_uniform, spread_legs, white_shirt, black_socks, green_necktie, green_skirt, kneehighs, penis, sex, skirt_lift, solo_focus, thighs, collarbone, collared_shirt, dog_tags, mosaic_censoring, nipples, vaginal, arms_up, black_panties, dark-skinned_male, handcuffs, interracial, missionary, open_mouth, open_shirt, panties_aside, rape |
| 2 | 32 |  |  |  |  |  | 1girl, solo, cleavage, looking_at_viewer, smile, collarbone, blush, navel, earrings, choker, bare_shoulders, red_bikini, necklace, thighs, side-tie_bikini_bottom |
| 3 | 9 |  |  |  |  |  | 1girl, cleavage, earrings, looking_at_viewer, solo, bare_shoulders, collarbone, blush, smile, blue_dress, choker, hair_flower, necklace, petals, rose |
| 4 | 11 |  |  |  |  |  | 1girl, blush, collarbone, looking_at_viewer, navel, nipples, solo, completely_nude, female_pubic_hair, closed_mouth, smile, thighs, simple_background |
| 5 | 8 |  |  |  |  |  | 1girl, blush, hetero, penis, spread_legs, 1boy, female_pubic_hair, mosaic_censoring, navel, nipples, sweat, completely_nude, solo_focus, on_back, collarbone, cum_in_pussy, after_sex, after_vaginal, cumdrip, open_mouth, tongue |
| 6 | 5 |  |  |  |  |  | 1girl, earrings, looking_at_viewer, shirt, solo, long_sleeves, necklace, parted_lips, collarbone, pants, simple_background, smile, blush, choker, very_long_hair |
| 7 | 5 |  |  |  |  |  | 1boy, 1girl, fellatio, hetero, penis, solo_focus, blush, cleavage, dress_shirt, male_pubic_hair, mosaic_censoring, white_shirt, collared_shirt, erection, necktie, stray_pubic_hair, collarbone, earrings, looking_at_viewer, nude, school_uniform, sweat |
| 8 | 6 |  |  |  |  |  | 1girl, black_dress, enmaided, looking_at_viewer, maid_headdress, solo, blush, frills, maid_apron, cleavage, smile, white_apron, simple_background, thighhighs |
| 9 | 5 |  |  |  |  |  | black_leotard, bowtie, cleavage, detached_collar, fake_animal_ears, looking_at_viewer, playboy_bunny, rabbit_ears, 1girl, bare_shoulders, blush, solo, wrist_cuffs, black_bow, smile, strapless_leotard, black_pantyhose, covered_navel, earrings, fishnets, rabbit_tail, simple_background, sitting, table, thighs, white_background |
### Table Version
| # | Samples | Img-1 | Img-2 | Img-3 | Img-4 | Img-5 | white_shirt | 1girl | school_uniform | solo | looking_at_viewer | cleavage | pleated_skirt | smile | blush | plaid_skirt | collarbone | dog_tags | green_necktie | dress_shirt | white_background | necklace | simple_background | 1boy | female_pubic_hair | hetero | loose_necktie | pussy | spread_legs | black_socks | green_skirt | kneehighs | penis | sex | skirt_lift | solo_focus | thighs | collared_shirt | mosaic_censoring | nipples | vaginal | arms_up | black_panties | dark-skinned_male | handcuffs | interracial | missionary | open_mouth | open_shirt | panties_aside | rape | navel | earrings | choker | bare_shoulders | red_bikini | side-tie_bikini_bottom | blue_dress | hair_flower | petals | rose | completely_nude | closed_mouth | sweat | on_back | cum_in_pussy | after_sex | after_vaginal | cumdrip | tongue | shirt | long_sleeves | parted_lips | pants | very_long_hair | fellatio | male_pubic_hair | erection | necktie | stray_pubic_hair | nude | black_dress | enmaided | maid_headdress | frills | maid_apron | white_apron | thighhighs | black_leotard | bowtie | detached_collar | fake_animal_ears | playboy_bunny | rabbit_ears | wrist_cuffs | black_bow | strapless_leotard | black_pantyhose | covered_navel | fishnets | rabbit_tail | sitting | table |
|----:|----------:|:--------------------------------|:--------------------------------|:--------------------------------|:--------------------------------|:--------------------------------|:--------------|:--------|:-----------------|:-------|:--------------------|:-----------|:----------------|:--------|:--------|:--------------|:-------------|:-----------|:----------------|:--------------|:-------------------|:-----------|:--------------------|:-------|:--------------------|:---------|:----------------|:--------|:--------------|:--------------|:--------------|:------------|:--------|:------|:-------------|:-------------|:---------|:-----------------|:-------------------|:----------|:----------|:----------|:----------------|:--------------------|:------------|:--------------|:-------------|:-------------|:-------------|:----------------|:-------|:--------|:-----------|:---------|:-----------------|:-------------|:-------------------------|:-------------|:--------------|:---------|:-------|:------------------|:---------------|:--------|:----------|:---------------|:------------|:----------------|:----------|:---------|:--------|:---------------|:--------------|:--------|:-----------------|:-----------|:------------------|:-----------|:----------|:-------------------|:-------|:--------------|:-----------|:-----------------|:---------|:-------------|:--------------|:-------------|:----------------|:---------|:------------------|:-------------------|:----------------|:--------------|:--------------|:------------|:--------------------|:------------------|:----------------|:-----------|:--------------|:----------|:--------|
| 0 | 58 |  |  |  |  |  | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 1 | 6 |  |  |  |  |  | X | X | X | | | | X | | X | X | X | X | X | X | | | | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 2 | 32 |  |  |  |  |  | | X | | X | X | X | | X | X | | X | | | | | X | | | | | | | | | | | | | | | X | | | | | | | | | | | | | | | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 3 | 9 |  |  |  |  |  | | X | | X | X | X | | X | X | | X | | | | | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X | X | X | | | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 4 | 11 |  |  |  |  |  | | X | | X | X | | | X | X | | X | | | | | | X | | X | | | | | | | | | | | | X | | | X | | | | | | | | | | | | X | | | | | | | | | | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 5 | 8 |  |  |  |  |  | | X | | | | | | | X | | X | | | | | | | X | X | X | | | X | | | | X | | | X | | | X | X | | | | | | | | X | | | | X | | | | | | | | | | X | | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 6 | 5 |  |  |  |  |  | | X | | X | X | | | X | X | | X | | | | | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X | X | | | | | | | | | | | | | | | | | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 7 | 5 |  |  |  |  |  | X | X | X | | X | X | | | X | | X | | | X | | | | X | | X | | | | | | | X | | | X | | X | X | | | | | | | | | | | | | | X | | | | | | | | | | | X | | | | | | | | | | | | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | |
| 8 | 6 |  |  |  |  |  | | X | | X | X | X | | X | X | | | | | | | | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X | X | X | X | X | X | X | | | | | | | | | | | | | | | |
| 9 | 5 |  |  |  |  |  | | X | | X | X | X | | X | X | | | | | | X | | X | | | | | | | | | | | | | | X | | | | | | | | | | | | | | | | X | | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X |
提供机构:
CyberHarem
原始信息汇总
数据集概述
数据集基本信息
- 名称: Dataset of shirase_sakuya/白瀬咲耶 (THE iDOLM@STER: SHINY COLORS)
- 许可证: MIT
- 任务类别: text-to-image
- 标签: art, not-for-all-audiences
- 大小类别: n<1K
数据集内容
- 图像数量: 500张
- 核心标签: long_hair, black_hair, yellow_eyes, breasts, bangs, hair_between_eyes, large_breasts, ponytail, high_ponytail
数据集包列表
| 名称 | 图像数量 | 大小 | 类型 | 描述 |
|---|---|---|---|---|
| raw | 500 | 796.73 MiB | Waifuc-Raw | 原始数据,包含元信息(最小边对齐到1400像素,如果更大) |
| 800 | 500 | 409.94 MiB | IMG+TXT | 短边不超过800像素的图像数据集 |
| stage3-p480-800 | 1220 | 874.63 MiB | IMG+TXT | 3阶段裁剪数据集,区域不小于480x480像素 |
| 1200 | 500 | 683.72 MiB | IMG+TXT | 短边不超过1200像素的图像数据集 |
| stage3-p480-1200 | 1220 | 1.30 GiB | IMG+TXT | 3阶段裁剪数据集,区域不小于480x480像素 |
标签聚类结果
原始文本版本
| # | 样本数量 | 图像示例 | 标签 |
|---|---|---|---|
| 0 | 58 | ![]() |
white_shirt, 1girl, school_uniform, solo, looking_at_viewer, cleavage, pleated_skirt, smile, blush, plaid_skirt, collarbone, dog_tags, green_necktie, dress_shirt, white_background, necklace, simple_background |
| 1 | 6 | ![]() |
1boy, 1girl, blush, dress_shirt, female_pubic_hair, hetero, loose_necktie, plaid_skirt, pleated_skirt, pussy, school_uniform, spread_legs, white_shirt, black_socks, green_necktie, green_skirt, kneehighs, penis, sex, skirt_lift, solo_focus, thighs, collarbone, collared_shirt, dog_tags, mosaic_censoring, nipples, vaginal, arms_up, black_panties, dark-skinned_male, handcuffs, interracial, missionary, open_mouth, open_shirt, panties_aside, rape |
| 2 | 32 | ![]() |
1girl, solo, cleavage, looking_at_viewer, smile, collarbone, blush, navel, earrings, choker, bare_shoulders, red_bikini, necklace, thighs, side-tie_bikini_bottom |
| 3 | 9 | ![]() |
1girl, cleavage, earrings, looking_at_viewer, solo, bare_shoulders, collarbone, blush, smile, blue_dress, choker, hair_flower, necklace, petals, rose |
| 4 | 11 | ![]() |
1girl, blush, collarbone, looking_at_viewer, navel, nipples, solo, completely_nude, female_pubic_hair, closed_mouth, smile, thighs, simple_background |
| 5 | 8 | ![]() |
1girl, blush, hetero, penis, spread_legs, 1boy, female_pubic_hair, mosaic_censoring, navel, nipples, sweat, completely_nude, solo_focus, on_back, collarbone, cum_in_pussy, after_sex, after_vaginal, cumdrip, open_mouth, tongue |
| 6 | 5 | ![]() |
1girl, earrings, looking_at_viewer, shirt, solo, long_sleeves, necklace, parted_lips, collarbone, pants, simple_background, smile, blush, choker, very_long_hair |
| 7 | 5 | ![]() |
1boy, 1girl, fellatio, hetero, penis, solo_focus, blush, cleavage, dress_shirt, male_pubic_hair, mosaic_censoring, white_shirt, collared_shirt, erection, necktie, stray_pubic_hair, collarbone, earrings, looking_at_viewer, nude, school_uniform, sweat |
| 8 | 6 | ![]() |
1girl, black_dress, enmaided, looking_at_viewer, maid_headdress, solo, blush, frills, maid_apron, cleavage, smile, white_apron, simple_background, thighhighs |
| 9 | 5 | ![]() |
black_leotard, bowtie, cleavage, detached_collar, fake_animal_ears, looking_at_viewer, playboy_bunny, rabbit_ears, 1girl, bare_shoulders, blush, solo, wrist_cuffs, black_bow, smile, strapless_leotard, black_pantyhose, covered_navel, earrings, fishnets, rabbit_tail, simple_background, sitting, table, thighs, white_background |
搜集汇总
数据集介绍

构建方式
该数据集聚焦于《偶像大师:闪耀色彩》中的角色白瀬咲耶,由DeepGHS团队构建,通过自动化爬取系统从Danbooru、Pixiv、Zerochan等多个图站采集了500幅图像及其对应标签。数据集中剔除了角色核心标签(如长发、黑发、黄瞳等),以确保标签的纯净性。数据集提供了多种打包版本,包括原始未处理数据(raw)、固定短边长度的缩放版本(800、1200像素)以及经过三阶段裁剪的区域版本(stage3-p480-800、stage3-p480-1200),以满足不同训练需求。
使用方法
用户可通过HuggingFace Hub直接下载不同版本的压缩包,解压后即可获得图像与标签文件。对于原始数据集,推荐使用Waifuc库进行加载,该库支持从本地目录读取图像及其关联标签,便于后续的数据处理与模型训练。具体而言,可通过`hf_hub_download`函数下载raw压缩包,解压后利用`LocalSource`类迭代访问每个样本,获取图像、文件名及标签信息。此外,数据集中的聚类结果可直接用于分析角色常见特征或辅助数据增强策略的设计。
背景与挑战
背景概述
在生成式人工智能与二次元文化交汇的浪潮中,面向特定动漫角色的高质量图像数据集成为推动文本到图像生成模型发展的关键基石。该数据集由DeepGHS团队于近期构建,聚焦于《偶像大师:闪耀色彩》中的角色白瀬咲耶,收录了500张经过精细标注的图像。其核心研究问题在于如何通过自动化爬取与多站点数据融合(涵盖Danbooru、Pixiv等平台),为角色定制化生成任务提供纯净、多尺度的训练资源。该数据集不仅提供了原始图像及其元信息,还创新性地推出了多种分辨率版本与三阶段裁剪版本,兼顾了模型训练的效率与图像细节的保留,对二次元角色生成领域的基准测试与模型微调具有显著的推动作用。
当前挑战
该数据集所面临的挑战主要体现在两个层面。在领域问题层面,动漫角色生成需应对风格多样性、姿态复杂性及场景泛化性的矛盾,尤其是长尾角色(如白瀬咲耶)的视觉特征(如黑长直、高马尾)在生成模型中的精确还原,以及如何避免常见标签(如“1girl”)导致的模式坍塌和过拟合。在构建过程中,挑战则源于跨平台数据爬取的版权与质量不一致性,自动标注系统(Waifuc)对多标签关系的语义解析误差,以及NSFW内容(如数据集样本中出现的成人向聚类)的合理过滤与隐私伦理考量,这些均对数据集的纯净度与可用性构成了严峻考验。
常用场景
经典使用场景
该数据集聚焦于《偶像大师:闪耀色彩》中的角色白瀬咲耶,收录了500张高质图像及其标注标签,是文本到图像生成领域的经典微调与风格迁移素材。研究者常将其用于动漫角色专属扩散模型的训练,通过精细化标签如“长黑发、黄瞳、高马尾”等特征描述,提升生成模型对特定角色形象一致性的捕捉能力。数据集提供的多尺度裁剪版本(如480×480与800像素)适配了不同分辨率需求的模型训练流程,成为二次元角色个性化生成管线中的重要基准资源。
解决学术问题
该数据集系统性地回应了动漫图像生成中角色身份保持与外观多样性的平衡难题。通过提供带结构化标签的纯净图像集合,它使研究者能够量化评估生成模型在角色面部特征、发型、服饰等细粒度属性上的保真度。数据集内置的标签聚类分析(如校服、泳装、裸体等场景分组)为解耦属性学习与条件生成提供了可复现的实证基础,推动了可控图像合成、风格迁移及少样本学习等方向在动漫领域的方法论进展。
实际应用
在实际应用中,该数据集支撑了虚拟偶像内容创作、游戏角色概念设计及衍生同人作品自动生成等场景。创作者可借助预训练模型,基于该数据集微调出能够按文本描述精准绘制白瀬咲耶不同姿态与服饰的生成工具。此外,数据集中的多分辨率版本适配了移动端与网页端轻量化部署需求,降低了动漫角色个性化生成技术的使用门槛,促进了AI辅助创作在娱乐产业中的落地。
数据集最近研究
最新研究方向
在生成式人工智能与动漫文化深度交融的浪潮中,以《偶像大师:闪耀色彩》角色白瀬咲耶为代表的二次元角色数据集,正成为文本到图像生成模型微调与个性化创作的核心驱动力。该数据集通过多源爬取与精细标注,不仅囊括了角色标志性的黑长直、高马尾等视觉特征,还通过标签聚类揭示了不同服饰与场景下的语义关联,为可控图像生成与风格迁移提供了高质量的训练素材。当前前沿研究聚焦于利用此类精细化数据集提升扩散模型对特定角色特征的保真度与泛化能力,同时探索多模态对齐技术以增强文本指令与视觉输出的一致性。该数据集的出现,不仅推动了二次元角色个性化生成技术的边界,也为虚拟偶像的数字化复现与交互体验开辟了新路径,在娱乐与文化产业中展现出深远的影响力。
以上内容由遇见数据集搜集并总结生成













