danbooru2023-metadata-database

Name: danbooru2023-metadata-database
Creator: maas
Published: 2025-12-26 15:18:37
License: 暂无描述

魔搭社区2025-12-26 更新2025-08-23 收录

下载链接：

https://modelscope.cn/datasets/KBlueLeaf/danbooru2023-metadata-database

下载链接

链接失效反馈

官方服务：

资源简介：

# Metadata Database for Danbooru2023 Danbooru 2023 datasets: https://huggingface.co/datasets/nyanko7/danbooru2023 The latest entry of this database is id 7,866,491. Which is newer than nyanko7's dataset. This dataset contains a sqlite db file which have all the tags and posts metadata in it. The Peewee ORM config file is provided too, plz check it for more information. (Especially on how I link posts and tags together) The original data is from the official dump of the posts info. Check this [link](https://console.cloud.google.com/storage/browser/danbooru_public/data) for more info. ## Format This dataset contains 3 format but they store same contents: * Sqlite (.db) * have 2 versions: with/without index. * Parquet * Parquet files' name indicate the sqlite/duckdb table name. * It is recommended to use post.parquet when you need to export tons of content. * Duckdb (.duckdb) * have 2 versions: with/without index. `others` folder will contains some pre-exported files like tags for each post. ## Details This section contains some details that you need to be aware of if you want to use other ORM system or use plain SQL query to utilize this database. #### Custom Enum Fields Some fields in Post/Tags use my custom enum field to store type/category or something like that: * Post.rating * 0: general * 1: sensitive * 2: questionable * 3: explicit * Tag.type * 0: general * 1: artist * 2: character * 3: copyright * 4: meta #### Tag List I use peewee ManyToManyField to implement the Tag List things. Which utilize a through model which have all the pair of Tag and Post Since it is very likely we will want to use Tag to query posts, so many-to-many is better. The con of this design is the database file will be 1.5x larger than before(we have 0.25B entries for the post-tag pairs), but the query speed become 2~3x faster, so I think it is acceptable. After done some checking, I can ensure that all the "categorical tag list" can be done by full list + filter, and that is how I done it now. Check the db.py for more details. #### Utils if you think above details are too complicated, just use the db_utils.py and other PeeWee API to utilize this database. I also provide a write_csv.py for exporting whole dataset into csv for data analysis. ## License The database files of this repo are licensed under MiT License. The source code files of this repo are licensed under Apache 2.0 License. ## Acknowledgement Thx for AngelBottomless for updating new entries

# Danbooru2023元数据库 Danbooru 2023数据集地址：https://huggingface.co/datasets/nyanko7/danbooru2023 本数据库的最新条目编号为7,866,491，相较于nyanko7发布的原数据集更新。本数据集包含存储了全部标签与作品元数据的SQLite数据库文件，同时附赠了Peewee对象关系映射（ORM）配置文件。如需了解更多细节（尤其是作品与标签的关联方式），可查阅该配置文件。本数据集的原始数据源自Danbooru官方发布的作品信息备份，如需了解更多详情可访问此链接：https://console.cloud.google.com/storage/browser/danbooru_public/data ## 存储格式本数据集提供三种存储格式，但其内部存储的内容完全一致： * SQLite数据库文件（.db）：包含带索引与无索引两个版本。 * Parquet文件：Parquet文件名对应SQLite/DuckDB的表名。当需要导出大量内容时，推荐使用post.parquet文件。 * DuckDB数据库文件（.duckdb）：包含带索引与无索引两个版本。 `others` 文件夹中包含部分预导出文件，例如各作品对应的标签列表。 ## 细节说明本节将介绍部分关键细节，若您计划使用其他对象关系映射系统或直接通过原生SQL查询使用本数据库，需留意以下内容。 #### 自定义枚举字段作品（Post）与标签（Tag）的部分字段采用了自定义枚举类型来存储类型、分类等信息： * 作品评级（Post.rating） * 0：普通（general） * 1：敏感（sensitive） * 2：可疑（questionable） * 3：露骨（explicit） * 标签类型（Tag.type） * 0：通用标签（general） * 1：画师标签（artist） * 2：角色标签（character） * 3：版权标签（copyright） * 4：元标签（meta） #### 标签列表实现本数据集采用Peewee多对多字段（ManyToManyField）实现标签列表功能，通过中间关联模型存储所有作品与标签的配对关系。由于多数场景下需要通过标签查询对应作品，因此采用多对多关系是更优的设计方案。该设计的缺点是数据库文件体积相较原始数据增大1.5倍（作品-标签配对条目总量达2.5亿条），但查询速度提升2~3倍，因此整体方案具备可行性。经测试验证，所有分类标签列表均可通过完整数据集结合过滤操作实现，本数据集即采用该方案。如需了解更多细节可查阅db.py文件。 #### 实用工具若您认为上述细节过于复杂，可直接使用db_utils.py及其他Peewee API来使用本数据库。此外还提供了write_csv.py脚本，可将完整数据集导出为CSV格式以用于数据分析。 ## 许可协议本仓库的数据库文件采用MIT许可协议授权，源代码文件采用Apache 2.0许可协议授权。 ## 致谢感谢AngelBottomless负责更新最新条目。

提供机构：

maas

创建时间：

2025-08-15

搜集汇总

数据集介绍