five

nyanko7/yandere2023

收藏
Hugging Face2024-05-06 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/nyanko7/yandere2023
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - image-classification - image-to-image - text-to-image language: - en - ja pretty_name: yandere2023 size_categories: - 1M<n<10M --- # Yandere2023: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset <!-- yande.re --> Yandere2023 is a comprehensive anime image dataset with over 1.2 million high-quality images sourced from various materials, including key frames, manga scans, artbooks, and more. While the average number of tags per image is relatively low, the dataset boasts a diverse collection of images with exceptional quality. - **Shared by:** Nyanko Devs - **Language(s):** English, Japanese - **License:** MIT ## Uses ## Format The goal of the dataset is to be as easy as possible to use immediately, avoiding obscure file formats, while allowing simultaneous research & seeding of the torrent, with easy updates. Images are provided in the full original form (be that JPG, PNG, GIF or otherwise) for reference/archival purposes, and bucketed into 1000 subdirectories 0000–0999 (0-padded), which is the Yandere ID modulo 1000 (ie. all images in 0999/ have an ID ending in ‘999’); IDs can be turned into paths by dividing & padding (eg. in Bash, BUCKET=$(printf "%04d" $(( ID % 1000 )) )) and then the file is at original/$BUCKET/$ID.$EXT. The reason for the bucketing is that a single directory would cause pathological filesystem performance, and modulo ID is a simple hash which spreads images evenly without requiring additional future directories to be made or a filesystem IO to check where the file is. The ID is not zero-padded and files end in the relevant extension, hence the file layout looks like this: ```bash $ tree / | less / ├── yandere2023 -> /mnt/diffusionstorage/workspace/yandere/ │ ├── metadata │ ├── readme.md │ ├── original │ │ ├── 0000 -> data-0000.tar │ │ ├── 0001 -> data-0001.tar │ │ │ ├── 10001.jpg │ │ │ ├── 210001.png │ │ │ ├── 3120001.webp │ │ │ ├── 6513001.jpg ```
提供机构:
nyanko7
原始信息汇总

Yandere2023: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset

Yandere2023 是一个全面的动漫图像数据集,包含超过 120 万张高质量图像,来源包括关键帧、漫画扫描、艺术书籍等。尽管每张图像的平均标签数量相对较低,但该数据集以其多样性和图像质量而著称。

  • 共享者: Nyanko Devs
  • 语言: 英语、日语
  • 许可证: MIT

用途

该数据集的目标是尽可能易于立即使用,避免使用晦涩的文件格式,同时允许同时进行研究和种子下载,并易于更新。

格式

图像以原始形式(JPG、PNG、GIF 等)提供,用于参考/存档目的,并分桶到 1000 个子目录 0000–0999(0 填充),这是 Yandere ID 模 1000(即 0999/ 中的所有图像的 ID 以 999 结尾);ID 可以通过除法和填充转换为路径(例如在 Bash 中,BUCKET=$(printf "%04d" $(( ID % 1000 ))),然后文件位于 original/$BUCKET/$ID.$EXT。

分桶的原因是单个目录会导致文件系统性能问题,而模 ID 是一个简单的哈希,可以均匀分布图像,而无需创建额外的未来目录或检查文件位置的文件系统 IO。ID 不是零填充的,文件以相关扩展名结尾,因此文件布局如下:

bash $ tree / | less

/ ├── yandere2023 -> /mnt/diffusionstorage/workspace/yandere/ │ ├── metadata │ ├── readme.md │ ├── original │ │ ├── 0000 -> data-0000.tar │ │ ├── 0001 -> data-0001.tar │ │ │ ├── 10001.jpg │ │ │ ├── 210001.png │ │ │ ├── 3120001.webp │ │ │ ├── 6513001.jpg

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作