Jakaline/Danbooru2023_metadata
收藏Hugging Face2024-01-13 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Jakaline/Danbooru2023_metadata
下载链接
链接失效反馈官方服务:
资源简介:
---
size_categories:
- 1M<n<10M
tags:
- not-for-all-audiences
---
# Danbooru2023_metadata
Size: 6,391,111 (6.39M)
Metadata of posts from [Danbooru](https://danbooru.donmai.us), up to post `#7042183`.
Contains only active posts (see below). Also, child posts which have 'pixel-perfect_duplicate' were removed.
## Columns
`id, md5, created_at, updated_at, score, up_score, down_score, rating, image_width, image_height, file_ext, parent_id, duration, pixel_hash, tag_string_general, tag_string_character, tag_string_copyright, tag_string_artist, tag_string_meta`
### Rating
- `g` (general): Completely safe for work.
- `s` (sensitive): Probably not safe for work.
- `q` (questionable): Softcore erotica.
- `e` (explicit): Hardcore erotica. Definitely not safe for work.
For more information, [see link](https://danbooru.donmai.us/wiki_pages/howto:rate)
## Specifics of Danbooru
### Status
Every danbooru posts could be categorized to four types of status.
1. Active: Posts that are approved by a moderator.
2. Pending: Posts that are waiting to be approved. If a post is pending for more than 3 days without a moderator's approval, the post is deleted.
3. Deleted: Posts that do not meet danbooru's standards. (Non-anime, low quality, etc)
4. Banned: Posts that are copyright-claimed or contain off-limit content.
Therefore, we should only use images from active posts. (Banned posts cannot be downloaded by normal users.)
Sadly, the [Danbooru2021 dataset](https://gwern.net/danbooru2021) did not filter any deleted posts, and a lot of text-to-image models were trained with deleted posts.
From a total of 7,042,183 posts, there are 224,522 banned posts and 370,595 deleted posts. Deleted posts take 5.26% of the total posts, which is not trivial. Therefore, this dataset have not included them.
## Misc
- If you aim to create your own anime-based image dataset from danbooru posts, you should definitely exclude posts with the following tags: `cosplay_photo third-party_edit text-only_page`
- Also consider excluding posts with the following tags, if you are aiming for high quality: `photo_(medium) 3d no_humans comic`
## Other Links
[Looking for parquet files?](https://huggingface.co/datasets/Jakaline/Danbooru2023_metadata/tree/refs%2Fconvert%2Fparquet/default/train)
Looking for images? [nyanko7/danbooru2023](https://huggingface.co/datasets/nyanko7/danbooru2023)
提供机构:
Jakaline
原始信息汇总
Danbooru2023_metadata
概述
- 大小: 6,391,111 (6.39M)
- 来源: Danbooru
- 包含内容: 截至帖子
#7042183的元数据,仅包含活跃帖子,移除了具有 pixel-perfect_duplicate 的子帖子。
列信息
id, md5, created_at, updated_at, score, up_score, down_score, rating, image_width, image_height, file_ext, parent_id, duration, pixel_hash, tag_string_general, tag_string_character, tag_string_copyright, tag_string_artist, tag_string_meta
评级
g(general): 完全适合工作环境。s(sensitive): 可能不适合工作环境。q(questionable): 软色情。e(explicit): 硬色情。绝对不适合工作环境。
Danbooru 帖子状态
- Active: 由版主批准的帖子。
- Pending: 等待批准的帖子。如果帖子在3天内未获批准,则会被删除。
- Deleted: 不符合Danbooru标准的帖子(非动漫、低质量等)。
- Banned: 版权声明或包含禁止内容的帖子。
其他信息
- 如果从Danbooru帖子创建自己的动漫图像数据集,应排除以下标签的帖子:
cosplay_photo third-party_edit text-only_page - 如果追求高质量,还应排除以下标签的帖子:
photo_(medium) 3d no_humans comic



