CyberHarem/sakurai_momoka_idolmastercinderellagirls
收藏Hugging Face2024-01-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/CyberHarem/sakurai_momoka_idolmastercinderellagirls
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-to-image
tags:
- art
- not-for-all-audiences
size_categories:
- n<1K
---
# Dataset of sakurai_momoka/櫻井桃華 (THE iDOLM@STER: Cinderella Girls)
This is the dataset of sakurai_momoka/櫻井桃華 (THE iDOLM@STER: Cinderella Girls), containing 500 images and their tags.
The core tags of this character are `blonde_hair, green_eyes, short_hair, hairband, bangs, bow`, which are pruned in this dataset.
Images are crawled from many sites (e.g. danbooru, pixiv, zerochan ...), the auto-crawling system is powered by [DeepGHS Team](https://github.com/deepghs)([huggingface organization](https://huggingface.co/deepghs)).
## List of Packages
| Name | Images | Size | Download | Type | Description |
|:-----------------|---------:|:-----------|:------------------------------------------------------------------------------------------------------------------------------------------|:-----------|:---------------------------------------------------------------------|
| raw | 500 | 679.01 MiB | [Download](https://huggingface.co/datasets/CyberHarem/sakurai_momoka_idolmastercinderellagirls/resolve/main/dataset-raw.zip) | Waifuc-Raw | Raw data with meta information (min edge aligned to 1400 if larger). |
| 800 | 500 | 373.32 MiB | [Download](https://huggingface.co/datasets/CyberHarem/sakurai_momoka_idolmastercinderellagirls/resolve/main/dataset-800.zip) | IMG+TXT | dataset with the shorter side not exceeding 800 pixels. |
| stage3-p480-800 | 1261 | 843.64 MiB | [Download](https://huggingface.co/datasets/CyberHarem/sakurai_momoka_idolmastercinderellagirls/resolve/main/dataset-stage3-p480-800.zip) | IMG+TXT | 3-stage cropped dataset with the area not less than 480x480 pixels. |
| 1200 | 500 | 592.83 MiB | [Download](https://huggingface.co/datasets/CyberHarem/sakurai_momoka_idolmastercinderellagirls/resolve/main/dataset-1200.zip) | IMG+TXT | dataset with the shorter side not exceeding 1200 pixels. |
| stage3-p480-1200 | 1261 | 1.20 GiB | [Download](https://huggingface.co/datasets/CyberHarem/sakurai_momoka_idolmastercinderellagirls/resolve/main/dataset-stage3-p480-1200.zip) | IMG+TXT | 3-stage cropped dataset with the area not less than 480x480 pixels. |
### Load Raw Dataset with Waifuc
We provide raw dataset (including tagged images) for [waifuc](https://deepghs.github.io/waifuc/main/tutorials/installation/index.html) loading. If you need this, just run the following code
```python
import os
import zipfile
from huggingface_hub import hf_hub_download
from waifuc.source import LocalSource
# download raw archive file
zip_file = hf_hub_download(
repo_id='CyberHarem/sakurai_momoka_idolmastercinderellagirls',
repo_type='dataset',
filename='dataset-raw.zip',
)
# extract files to your directory
dataset_dir = 'dataset_dir'
os.makedirs(dataset_dir, exist_ok=True)
with zipfile.ZipFile(zip_file, 'r') as zf:
zf.extractall(dataset_dir)
# load the dataset with waifuc
source = LocalSource(dataset_dir)
for item in source:
print(item.image, item.meta['filename'], item.meta['tags'])
```
## List of Clusters
List of tag clustering result, maybe some outfits can be mined here.
### Raw Text Version
| # | Samples | Img-1 | Img-2 | Img-3 | Img-4 | Img-5 | Tags |
|----:|----------:|:----------------------------------|:----------------------------------|:----------------------------------|:----------------------------------|:----------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | 29 |  |  |  |  |  | 1girl, smile, looking_at_viewer, solo, blush, hair_flower, braid, rose, white_gloves, petals, open_mouth, red_dress, white_thighhighs |
| 1 | 7 |  |  |  |  |  | 1girl, looking_at_viewer, smile, blush, dress, lolita_hairband, solo, white_background |
| 2 | 5 |  |  |  |  |  | 1girl, looking_at_viewer, puffy_short_sleeves, red_dress, simple_background, smile, solo, white_background, white_shirt, black_footwear, black_ribbon, frilled_dress, full_body, mary_janes, neck_ribbon, standing, closed_mouth, pinafore_dress, white_socks, blush, bobby_socks, hair_between_eyes, red_hairband, skirt_hold, wavy_hair |
| 3 | 11 |  |  |  |  |  | 1girl, blush, solo, hair_flower, looking_at_viewer, red_dress, black_bow, hair_between_eyes, pink_rose, short_over_long_sleeves, frilled_hairband, simple_background, white_background, closed_mouth, puffy_short_sleeves, black_hairband, red_flower, :d, black_thighhighs, frilled_dress, open_mouth |
| 4 | 7 |  |  |  |  |  | 1girl, beret, blue_dress, blush, looking_at_viewer, puffy_short_sleeves, solo, smile, blue_headwear, hair_between_eyes, twin_braids, frilled_dress, plaid_dress, wrist_cuffs, blue_belt, closed_mouth, leaf, pink_bowtie, pleated_dress, shirt, simple_background, white_background |
| 5 | 5 |  |  |  |  |  | 1girl, blush, open_mouth, simple_background, solo, :d, hair_between_eyes, looking_at_viewer, medium_hair, upper_body, hair_bow, pink_hairband, white_background, collarbone, pink_dress, shirt, short_sleeves, wavy_hair, white_dress |
| 6 | 16 |  |  |  |  |  | blush, gym_uniform, white_shirt, 1girl, short_sleeves, solo, looking_at_viewer, gym_shirt, red_hairband, name_tag, red_shorts, open_mouth, simple_background, hair_between_eyes, wavy_hair, white_background, :d, gym_shorts |
| 7 | 17 |  |  |  |  |  | 1girl, long_sleeves, blush, pleated_skirt, solo, blue_skirt, looking_at_viewer, red_bow, blue_shirt, smile, simple_background, white_thighhighs, bowtie, hat, white_background, white_sailor_collar, blue_serafuku, hair_between_eyes, open_mouth, randoseru, zettai_ryouiki, blue_headwear, wavy_hair |
| 8 | 18 |  |  |  |  |  | 1girl, looking_at_viewer, solo, blush, loli, nipples, nude, pussy, small_breasts, smile, navel, open_mouth, simple_background, white_background, uncensored, barefoot, cleft_of_venus, flat_chest, lying, anus |
| 9 | 6 |  |  |  |  |  | blush, looking_at_viewer, ponytail, shirt, solo, 1girl, blue_skirt, cheerleader, midriff, pleated_skirt, simple_background, smile, white_background, bike_shorts, crop_top, holding_pom_poms, navel, one_eye_closed, shorts_under_skirt, sleeveless, sneakers, sweat, armpits, blue_bow, hair_bow, open_mouth, white_socks |
| 10 | 14 |  |  |  |  |  | 1girl, blue_one-piece_swimsuit, competition_school_swimsuit, blush, white_background, looking_at_viewer, simple_background, wavy_hair, solo, small_breasts, ribbon, thighs, beachball, collarbone, name_tag, ass, cowboy_shot, smile, covered_navel, hair_between_eyes, shoes, socks |
| 11 | 5 |  |  |  |  |  | barefoot, blue_one-piece_swimsuit, blush, grey_background, looking_at_viewer, simple_background, small_breasts, 1girl, covered_navel, kneeling, twitter_username, wavy_hair, bare_arms, bare_legs, brown_background, closed_mouth, collarbone, hair_between_eyes, medium_hair, old_school_swimsuit, smile, armpits, arms_behind_head, arms_up, ass_visible_through_thighs, bare_shoulders, hair_bow, multiple_girls, red_bow, red_hairband, solo_focus |
| 12 | 6 |  |  |  |  |  | 1girl, hetero, penis, solo_focus, 1boy, flower, handjob, open_mouth, loli, mosaic_censoring, nude, blush, smile |
### Table Version
| # | Samples | Img-1 | Img-2 | Img-3 | Img-4 | Img-5 | 1girl | smile | looking_at_viewer | solo | blush | hair_flower | braid | rose | white_gloves | petals | open_mouth | red_dress | white_thighhighs | dress | lolita_hairband | white_background | puffy_short_sleeves | simple_background | white_shirt | black_footwear | black_ribbon | frilled_dress | full_body | mary_janes | neck_ribbon | standing | closed_mouth | pinafore_dress | white_socks | bobby_socks | hair_between_eyes | red_hairband | skirt_hold | wavy_hair | black_bow | pink_rose | short_over_long_sleeves | frilled_hairband | black_hairband | red_flower | :d | black_thighhighs | beret | blue_dress | blue_headwear | twin_braids | plaid_dress | wrist_cuffs | blue_belt | leaf | pink_bowtie | pleated_dress | shirt | medium_hair | upper_body | hair_bow | pink_hairband | collarbone | pink_dress | short_sleeves | white_dress | gym_uniform | gym_shirt | name_tag | red_shorts | gym_shorts | long_sleeves | pleated_skirt | blue_skirt | red_bow | blue_shirt | bowtie | hat | white_sailor_collar | blue_serafuku | randoseru | zettai_ryouiki | loli | nipples | nude | pussy | small_breasts | navel | uncensored | barefoot | cleft_of_venus | flat_chest | lying | anus | ponytail | cheerleader | midriff | bike_shorts | crop_top | holding_pom_poms | one_eye_closed | shorts_under_skirt | sleeveless | sneakers | sweat | armpits | blue_bow | blue_one-piece_swimsuit | competition_school_swimsuit | ribbon | thighs | beachball | ass | cowboy_shot | covered_navel | shoes | socks | grey_background | kneeling | twitter_username | bare_arms | bare_legs | brown_background | old_school_swimsuit | arms_behind_head | arms_up | ass_visible_through_thighs | bare_shoulders | multiple_girls | solo_focus | hetero | penis | 1boy | flower | handjob | mosaic_censoring |
|----:|----------:|:----------------------------------|:----------------------------------|:----------------------------------|:----------------------------------|:----------------------------------|:--------|:--------|:--------------------|:-------|:--------|:--------------|:--------|:-------|:---------------|:---------|:-------------|:------------|:-------------------|:--------|:------------------|:-------------------|:----------------------|:--------------------|:--------------|:-----------------|:---------------|:----------------|:------------|:-------------|:--------------|:-----------|:---------------|:-----------------|:--------------|:--------------|:--------------------|:---------------|:-------------|:------------|:------------|:------------|:--------------------------|:-------------------|:-----------------|:-------------|:-----|:-------------------|:--------|:-------------|:----------------|:--------------|:--------------|:--------------|:------------|:-------|:--------------|:----------------|:--------|:--------------|:-------------|:-----------|:----------------|:-------------|:-------------|:----------------|:--------------|:--------------|:------------|:-----------|:-------------|:-------------|:---------------|:----------------|:-------------|:----------|:-------------|:---------|:------|:----------------------|:----------------|:------------|:-----------------|:-------|:----------|:-------|:--------|:----------------|:--------|:-------------|:-----------|:-----------------|:-------------|:--------|:-------|:-----------|:--------------|:----------|:--------------|:-----------|:-------------------|:-----------------|:---------------------|:-------------|:-----------|:--------|:----------|:-----------|:--------------------------|:------------------------------|:---------|:---------|:------------|:------|:--------------|:----------------|:--------|:--------|:------------------|:-----------|:-------------------|:------------|:------------|:-------------------|:----------------------|:-------------------|:----------|:-----------------------------|:-----------------|:-----------------|:-------------|:---------|:--------|:-------|:---------|:----------|:-------------------|
| 0 | 29 |  |  |  |  |  | X | X | X | X | X | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 1 | 7 |  |  |  |  |  | X | X | X | X | X | | | | | | | | | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 2 | 5 |  |  |  |  |  | X | X | X | X | X | | | | | | | X | | | | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 3 | 11 |  |  |  |  |  | X | | X | X | X | X | | | | | X | X | | | | X | X | X | | | | X | | | | | X | | | | X | | | | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 4 | 7 |  |  |  |  |  | X | X | X | X | X | | | | | | | | | | | X | X | X | | | | X | | | | | X | | | | X | | | | | | | | | | | | X | X | X | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 5 | 5 |  |  |  |  |  | X | | X | X | X | | | | | | X | | | | | X | | X | | | | | | | | | | | | | X | | | X | | | | | | | X | | | | | | | | | | | | X | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 6 | 16 |  |  |  |  |  | X | | X | X | X | | | | | | X | | | | | X | | X | X | | | | | | | | | | | | X | X | | X | | | | | | | X | | | | | | | | | | | | | | | | | | | X | | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 7 | 17 |  |  |  |  |  | X | X | X | X | X | | | | | | X | | X | | | X | | X | | | | | | | | | | | | | X | | | X | | | | | | | | | | | X | | | | | | | | | | | | | | | | | | | | | | X | X | X | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 8 | 18 |  |  |  |  |  | X | X | X | X | X | | | | | | X | | | | | X | | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X | X | X | X | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 9 | 6 |  |  |  |  |  | X | X | X | X | X | | | | | | X | | | | | X | | X | | | | | | | | | | | X | | | | | | | | | | | | | | | | | | | | | | | | X | | | X | | | | | | | | | | | | X | X | | | | | | | | | | | | | | X | | | | | | | X | X | X | X | X | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| 10 | 14 |  |  |  |  |  | X | X | X | X | X | | | | | | | | | | | X | | X | | | | | | | | | | | | | X | | | X | | | | | | | | | | | | | | | | | | | | | | | | X | | | | | | X | | | | | | | | | | | | | | | | | | X | | | | | | | | | | | | | | | | | | | | | X | X | X | X | X | X | X | X | X | X | | | | | | | | | | | | | | | | | | | |
| 11 | 5 |  |  |  |  |  | X | X | X | | X | | | | | | | | | | | | | X | | | | | | | | | X | | | | X | X | | X | | | | | | | | | | | | | | | | | | | | X | | X | | X | | | | | | | | | | | | X | | | | | | | | | | | | X | | | X | | | | | | | | | | | | | | | | X | | X | | | | | | | X | | | X | X | X | X | X | X | X | X | X | X | X | X | X | | | | | | |
| 12 | 6 |  |  |  |  |  | X | X | | | X | | | | | | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X | | X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X | X | X | X | X | X | X |
提供机构:
CyberHarem
原始信息汇总
数据集概述
数据集名称
Dataset of sakurai_momoka/櫻井桃華 (THE iDOLM@STER: Cinderella Girls)
数据集描述
该数据集包含500张图像及其标签,主要描绘角色sakurai_momoka/櫻井桃華(来自游戏《THE iDOLM@STER: Cinderella Girls》)。图像主要标签包括blonde_hair, green_eyes, short_hair, hairband, bangs, bow。
数据来源
图像从多个网站(如danbooru, pixiv, zerochan等)爬取,爬虫系统由DeepGHS Team提供支持。
数据集包列表
| 名称 | 图像数量 | 大小 | 下载链接 | 类型 | 描述 |
|---|---|---|---|---|---|
| raw | 500 | 679.01 MiB | Download | Waifuc-Raw | 包含元信息的原始数据(最小边对齐到1400像素,如果更大)。 |
| 800 | 500 | 373.32 MiB | Download | IMG+TXT | 短边不超过800像素的数据集。 |
| stage3-p480-800 | 1261 | 843.64 MiB | Download | IMG+TXT | 3阶段裁剪数据集,区域不小于480x480像素。 |
| 1200 | 500 | 592.83 MiB | Download | IMG+TXT | 短边不超过1200像素的数据集。 |
| stage3-p480-1200 | 1261 | 1.20 GiB | Download | IMG+TXT | 3阶段裁剪数据集,区域不小于480x480像素。 |
标签聚类结果
原始文本版本
| # | 样本数量 | 图像示例1 | 图像示例2 | 图像示例3 | 图像示例4 | 图像示例5 | 标签 |
|---|---|---|---|---|---|---|---|
| 0 | 29 | ![]() |
![]() |
![]() |
![]() |
![]() |
1girl, smile, looking_at_viewer, solo, blush, hair_flower, braid, rose, white_gloves, petals, open_mouth, red_dress, white_thighhighs |
| 1 | 7 | ![]() |
![]() |
![]() |
![]() |
![]() |
1girl, looking_at_viewer, smile, blush, dress, lolita_hairband, solo, white_background |
| 2 | 5 | ![]() |
![]() |
![]() |
![]() |
![]() |
1girl, looking_at_viewer, puffy_short_sleeves, red_dress, simple_background, smile, solo, white_background, white_shirt, black_footwear, black_ribbon, frilled_dress, full_body, mary_janes, neck_ribbon, standing, closed_mouth, pinafore_dress, white_socks, blush, bobby_socks, hair_between_eyes, red_hairband, skirt_hold, wavy_hair |
| 3 | 11 | ![]() |
![]() |
![]() |
![]() |
![]() |
1girl, blush, solo, hair_flower, looking_at_viewer, red_dress, black_bow, hair_between_eyes, pink_rose, short_over_long_sleeves, frilled_hairband, simple_background, white_background, closed_mouth, puffy_short_sleeves, black_hairband, red_flower, :d, black_thighhighs, frilled_dress, open_mouth |
| 4 | 7 | ![]() |
![]() |
![]() |
![]() |
![]() |
1girl, beret, blue_dress, blush, looking_at_viewer, puffy_short_sleeves, solo, smile, blue_headwear, hair_between_eyes, twin_braids, frilled_dress, plaid_dress, wrist_cuffs, blue_belt, closed_mouth, leaf, pink_bowtie, pleated_dress, shirt, simple_background, white_background |
| 5 | 5 | ![]() |
![]() |
![]() |
![]() |
![]() |
1girl, blush, open_mouth, simple_background, solo, :d, hair_between_eyes, looking_at_viewer, medium_hair, upper_body, hair_bow, pink_hairband, white_background, collarbone, pink_dress, shirt, short_sleeves, wavy_hair, white_dress |
| 6 | 16 | ![]() |
![]() |
![]() |
![]() |
![]() |
blush, gym_uniform, white_shirt, 1girl, short_sleeves, solo, looking_at_viewer, gym_shirt, red_hairband, name_tag, red_shorts, open_mouth, simple_background, hair_between_eyes, wavy_hair, white_background, :d, gym_shorts |
| 7 | 17 | ![]() |
![]() |
![]() |
![]() |
![]() |
1girl, long_sleeves, blush, pleated_skirt, solo, blue_skirt, looking_at_viewer, red_bow, blue_shirt, smile, simple_background, white_thighhighs, bowtie, hat, white_background, white_sailor_collar, blue_serafuku, hair_between_eyes, open_mouth, randoseru, zettai_ryouiki, blue_headwear, wavy_hair |
| 8 | 18 | ![]() |
![]() |
![]() |
![]() |
![]() |
1girl, looking_at_viewer, solo, blush, loli, nipples, nude, pussy, small_breasts, smile, navel, open_mouth, simple_background, white_background, uncensored, barefoot, cleft_of_venus, flat_chest, lying, anus |
| 9 | 6 | ![]() |
![]() |
![]() |
![]() |
![]() |
blush, looking_at_viewer, ponytail, shirt, solo, 1girl, blue_skirt, cheerleader, midriff, pleated_skirt, simple_background, smile, white_background, bike_shorts, crop_top, holding_pom_poms, navel, one_eye_closed, shorts_under_skirt, sleeveless, sneakers, sweat, armpits, blue_bow, hair_bow, open_mouth, white_socks |
| 10 | 14 | ![]() |
![]() |
![]() |
![]() |
![]() |
1girl, blue_one-piece_swimsuit, competition_school_swimsuit, blush, white_background, looking_at_viewer, simple_background, wavy_hair, solo, small_breasts, ribbon, thighs, beachball, collarbone, name_tag, ass, cowboy_shot, smile, covered_navel, hair_between_eyes, shoes, socks |
| 11 | 5 | ![]() |
![]() |
![]() |
![]() |
![]() |
barefoot, blue_one-piece_swimsuit, blush, grey_background, looking_at_viewer, simple_background, small_breasts, 1girl, covered_navel, kneeling, twitter_username, wavy_hair, bare_arms, bare_legs, brown_background, closed_mouth, collarbone, hair_between_eyes, medium_hair, old_school_swimsuit, smile, armpits, arms_behind_head, arms_up, ass_visible_through_thighs, bare_shoulders, hair_bow, multiple_girls, red_bow, red_hairband, solo_focus |
| 12 | 6 | ![]() |
![]() |
![]() |
![]() |
![]() |
1girl, hetero, penis, solo_focus, 1boy, flower, handjob, open_mouth, loli, mosaic_censoring, nude, blush, smile |
搜集汇总
数据集介绍

构建方式
该数据集聚焦于《偶像大师:灰姑娘女孩》中的角色櫻井桃華,由DeepGHS团队基于自动化爬取系统构建,图像来源涵盖Danbooru、Pixiv、Zerochan等多个知名插画平台。原始数据包含500张图像及其关联标签,核心特征如金发、绿瞳、短发等已在数据集中被剔除,以增强模型对角色本质特征的泛化能力。数据集提供多种处理版本,包括原始元数据包、短边限制为800或1200像素的标准化版本,以及通过三阶段裁剪策略生成的增强版,后者在保持图像质量的同时扩展了样本数量至1261张。
特点
数据集以角色为中心,具备高度的主题一致性,所有图像均围绕櫻井桃華这一特定角色展开,适合用于文本到图像生成模型的微调。其标签系统丰富而精细,原始标签经过聚类分析后形成多个语义簇,如不同服饰风格(水手服、啦啦队服、泳装)、姿势(微笑、注视观众)及场景(纯色背景、户外),为模型学习提供了多样化的视觉语境。数据集还包含明确的许可信息(MIT协议),并标注了内容分级,确保使用的合规性。
使用方法
使用者可通过HuggingFace Hub直接下载压缩包,支持Python环境下的快速加载。推荐使用waifuc库处理原始数据集,通过LocalSource接口即可迭代访问图像及其元数据,实现灵活的批处理。对于需要标准化输入的生成任务,可直接选用800或1200像素的预裁剪版本;若需扩充数据量并提升鲁棒性,建议采用三阶段裁剪数据集。此外,聚类结果提供了标签与图像的对应关系,可用于训练前的数据筛选或增强策略设计。
背景与挑战
背景概述
在生成式人工智能领域,文本到图像(text-to-image)模型的发展日益依赖于高质量、细粒度的数据集。CyberHarem团队于近年构建的sakurai_momoka_idolmastercinderellagirls数据集,聚焦于《偶像大师灰姑娘女孩》中的角色櫻井桃華,旨在为动漫风格的角色生成任务提供标准化素材。该数据集由DeepGHS团队主导,通过自动化爬虫系统从Danbooru、Pixiv、Zerochan等多平台采集500张图像,并附带精细的标签注释,涵盖了角色核心特征(如金发、绿瞳)及多样化服饰与场景。其影响力体现在为二次元角色定制化生成研究提供了可复用的基准资源,推动了小样本、高一致性角色生成技术的探索。
当前挑战
该数据集面临的核心挑战首先在于领域问题的复杂性:文本到图像生成需在保持角色身份一致性的同时,应对姿态、服饰、背景的多样性,而仅有500张样本的规模对模型泛化能力构成严峻考验。其次,构建过程中遭遇多重技术障碍:多源图像采集需处理版权合规与质量筛选,自动标签系统需精确解析角色核心属性并避免噪声干扰;数据预处理中,图像尺寸统一(如800像素、1200像素)与裁剪策略(三级裁剪)需平衡细节保留与存储效率;此外,聚类分析暴露了数据分布不均问题,部分类别(如裸露内容)样本稀少,可能引入偏见或生成稳定性风险。
常用场景
经典使用场景
该数据集聚焦于《偶像大师:灰姑娘女孩》中的角色樱井桃华,收录了500张经过精细标注的图像及其关联标签。其最经典的用途在于微调文本到图像生成模型,如Stable Diffusion系列,以实现对特定动漫角色风格的高保真复现。通过使用该数据集提供的多尺度裁剪版本(如800px和1200px)或经过阶段式裁剪的stage3版本,研究者能够训练模型精准捕捉角色标志性的金发、绿瞳、短发与发箍等核心视觉特征,从而在生成任务中保持角色身份的一致性。
实际应用
在实际应用层面,该数据集广泛服务于二次元内容创作与数字娱乐产业。创作者可借助基于此数据集微调的模型,高效生成樱井桃华在多种场景、服饰或姿态下的全新插画,极大降低同人创作与角色衍生设计的成本。此外,该数据集也可用于构建角色识别与分类系统,辅助游戏开发、虚拟偶像运营及自动化图像标注工具,提升相关产业在角色资产管理和内容生成环节的自动化水平。
衍生相关工作
此数据集衍生了多项具有影响力的工作。一方面,它作为角色定制微调的基础语料,启发了诸如DreamBooth和LoRA等参数高效微调方法在动漫领域的应用,推动了面向特定角色的小样本生成技术发展。另一方面,数据集内提供的标签聚类结果,为后续研究如基于服饰或场景的角色风格迁移、无监督角色概念分解等提供了分析范例。此外,其配套的waifuc工具链也促进了大规模动漫图像自动化采集与清洗流程的标准化,成为相关开源生态的重要基石。
以上内容由遇见数据集搜集并总结生成




































































