five

Jakaline/Danbooru2023_metadata

收藏
Hugging Face2024-01-13 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Jakaline/Danbooru2023_metadata
下载链接
链接失效反馈
官方服务:
资源简介:
--- size_categories: - 1M<n<10M tags: - not-for-all-audiences --- # Danbooru2023_metadata Size: 6,391,111 (6.39M) Metadata of posts from [Danbooru](https://danbooru.donmai.us), up to post `#7042183`. Contains only active posts (see below). Also, child posts which have 'pixel-perfect_duplicate' were removed. ## Columns `id, md5, created_at, updated_at, score, up_score, down_score, rating, image_width, image_height, file_ext, parent_id, duration, pixel_hash, tag_string_general, tag_string_character, tag_string_copyright, tag_string_artist, tag_string_meta` ### Rating - `g` (general): Completely safe for work. - `s` (sensitive): Probably not safe for work. - `q` (questionable): Softcore erotica. - `e` (explicit): Hardcore erotica. Definitely not safe for work. For more information, [see link](https://danbooru.donmai.us/wiki_pages/howto:rate) ## Specifics of Danbooru ### Status Every danbooru posts could be categorized to four types of status. 1. Active: Posts that are approved by a moderator. 2. Pending: Posts that are waiting to be approved. If a post is pending for more than 3 days without a moderator's approval, the post is deleted. 3. Deleted: Posts that do not meet danbooru's standards. (Non-anime, low quality, etc) 4. Banned: Posts that are copyright-claimed or contain off-limit content. Therefore, we should only use images from active posts. (Banned posts cannot be downloaded by normal users.) Sadly, the [Danbooru2021 dataset](https://gwern.net/danbooru2021) did not filter any deleted posts, and a lot of text-to-image models were trained with deleted posts. From a total of 7,042,183 posts, there are 224,522 banned posts and 370,595 deleted posts. Deleted posts take 5.26% of the total posts, which is not trivial. Therefore, this dataset have not included them. ## Misc - If you aim to create your own anime-based image dataset from danbooru posts, you should definitely exclude posts with the following tags: `cosplay_photo third-party_edit text-only_page` - Also consider excluding posts with the following tags, if you are aiming for high quality: `photo_(medium) 3d no_humans comic` ## Other Links [Looking for parquet files?](https://huggingface.co/datasets/Jakaline/Danbooru2023_metadata/tree/refs%2Fconvert%2Fparquet/default/train) Looking for images? [nyanko7/danbooru2023](https://huggingface.co/datasets/nyanko7/danbooru2023)
提供机构:
Jakaline
原始信息汇总

Danbooru2023_metadata

概述

  • 大小: 6,391,111 (6.39M)
  • 来源: Danbooru
  • 包含内容: 截至帖子 #7042183 的元数据,仅包含活跃帖子,移除了具有 pixel-perfect_duplicate 的子帖子。

列信息

  • id, md5, created_at, updated_at, score, up_score, down_score, rating, image_width, image_height, file_ext, parent_id, duration, pixel_hash, tag_string_general, tag_string_character, tag_string_copyright, tag_string_artist, tag_string_meta

评级

  • g (general): 完全适合工作环境。
  • s (sensitive): 可能不适合工作环境。
  • q (questionable): 软色情。
  • e (explicit): 硬色情。绝对不适合工作环境。

Danbooru 帖子状态

  1. Active: 由版主批准的帖子。
  2. Pending: 等待批准的帖子。如果帖子在3天内未获批准,则会被删除。
  3. Deleted: 不符合Danbooru标准的帖子(非动漫、低质量等)。
  4. Banned: 版权声明或包含禁止内容的帖子。

其他信息

  • 如果从Danbooru帖子创建自己的动漫图像数据集,应排除以下标签的帖子:cosplay_photo third-party_edit text-only_page
  • 如果追求高质量,还应排除以下标签的帖子:photo_(medium) 3d no_humans comic
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作