CyberHarem/shimakaze_kantaicollection
收藏Hugging Face2024-01-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/CyberHarem/shimakaze_kantaicollection
下载链接
链接失效反馈官方服务:
资源简介:
这是shimakaze/島風 (Kantai Collection)的数据集,包含500张图片及其标签。图片从多个网站(如danbooru、pixiv、zerochan等)爬取,爬取系统由DeepGHS团队提供。数据集的核心标签包括`blonde_hair, long_hair, hairband, hair_ornament`,这些标签在数据集中已被修剪。数据集提供了多个版本,包括原始数据、不同尺寸的图片以及经过裁剪的版本。此外,还提供了如何使用waifuc加载原始数据集的示例代码,并展示了标签聚类结果。
This is a dataset featuring shimakaze/島風 (from the media franchise Kantai Collection), which contains 500 images and their associated annotation labels. These images were scraped from multiple online platforms including danbooru, pixiv, zerochan, among others, with the crawling infrastructure provided by the DeepGHS team. The core annotation tags include `blonde_hair, long_hair, hairband, hair_ornament`, and these tags have been pruned during dataset processing. Multiple variants of the dataset are available, including the raw unprocessed data, images in various resolutions, and cropped image subsets. Furthermore, sample code for loading the raw dataset via the waifuc tool is provided, alongside the results of tag clustering analysis.
提供机构:
CyberHarem
原始信息汇总
数据集概述
数据集信息
- 名称: Dataset of shimakaze/島風 (Kantai Collection)
- 许可: MIT
- 任务类别: text-to-image
- 标签: art, not-for-all-audiences
- 大小类别: n<1K
- 描述: 包含500张图片及其标签,核心标签为
blonde_hair, long_hair, hairband, hair_ornament。
数据包列表
| 名称 | 图片数量 | 大小 | 类型 | 描述 |
|---|---|---|---|---|
| raw | 500 | 584.48 MiB | Waifuc-Raw | 原始数据,包含元信息(最小边对齐到1400像素,如果更大)。 |
| 800 | 500 | 372.58 MiB | IMG+TXT | 短边不超过800像素的数据集。 |
| stage3-p480-800 | 1226 | 767.47 MiB | IMG+TXT | 3阶段裁剪数据集,区域不小于480x480像素。 |
| 1200 | 500 | 533.53 MiB | IMG+TXT | 短边不超过1200像素的数据集。 |
| stage3-p480-1200 | 1226 | 1003.65 MiB | IMG+TXT | 3阶段裁剪数据集,区域不小于480x480像素。 |
数据加载
-
工具: waifuc
-
代码示例: python import os import zipfile from huggingface_hub import hf_hub_download from waifuc.source import LocalSource
下载原始归档文件
zip_file = hf_hub_download( repo_id=CyberHarem/shimakaze_kantaicollection, repo_type=dataset, filename=dataset-raw.zip, )
提取文件到指定目录
dataset_dir = dataset_dir os.makedirs(dataset_dir, exist_ok=True) with zipfile.ZipFile(zip_file, r) as zf: zf.extractall(dataset_dir)
使用waifuc加载数据集
source = LocalSource(dataset_dir) for item in source: print(item.image, item.meta[filename], item.meta[tags])
标签聚类结果
- 格式: 文本和表格
- 示例:
-
文本格式:
# 样本数 图片1 图片2 图片3 图片4 图片5 标签 0 14 ... ... ... ... ... 1girl, elbow_gloves, solo, striped_thighhighs, white_gloves, black_panties, looking_at_viewer, skirt, navel, brown_eyes, blush, yellow_eyes, midriff 1 5 ... ... ... ... ... 1girl, :3, >_<, black_panties, brown_eyes, closed_eyes, elbow_gloves, lifebuoy, looking_at_viewer, skirt, solo, striped_thighhighs, white_gloves, anchor, blush, navel, midriff, yellow_eyes ... ... ... ... ... ... ... ... -
表格格式:
# 样本数 图片1 图片2 图片3 图片4 图片5 1girl elbow_gloves solo striped_thighhighs white_gloves black_panties looking_at_viewer skirt navel brown_eyes blush yellow_eyes midriff :3 >_< closed_eyes lifebuoy anchor highleg_panties blue_eyes crop_top sailor_collar pleated_skirt serafuku blue_skirt simple_background miniskirt white_background microskirt grey_eyes black_neckerchief anchor_hair_ornament black_hairband blue_sailor_collar hair_between_eyes sleeveless upper_body open_mouth 0 14 ... ... ... ... ... X X X X X X X X X X X X X 1 5 ... ... ... ... ... X X X X X X X X X X X X X X X X X X ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
-



