danbooru2023-webp-4Mpixel
收藏魔搭社区2026-05-11 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/KBlueLeaf/danbooru2023-webp-4Mpixel
下载链接
链接失效反馈官方服务:
资源简介:
# Danbooru 2023 webp: A space-efficient version of Danbooru 2023
This dataset is a resized/re-encoded version of [danbooru2023](https://huggingface.co/datasets/nyanko7/danbooru2023).<br>
Which removed the non-image/truncated files and resize all of them into smaller size.
This dataset already be updated to latest_id = 7,832,883.
Thx to DeepGHS!
**Notice**: content of updates folder and deepghs/danbooru_newest-webp-4Mpixel have been merged to 2000~2999.tar, You can ignore all the content in updates folder safely!
---
## Details
This dataset employs few method to reduce the size and improve the efficiency.
### Size and Format
This dataset resize all the image which have more than 2048x2048 pixel into near 2048x2048 pixels with bicubic algorithm.<br>
And remove all the image with longer edge larger than 16383 after resize.<br>
(one reason is beacuse webp doesn't allow that, another is that aspect ratio is too large/small.)
This dataset encode/save all the image with 90% quality webp with pillow library in Python.
Which is half size of the 100% quality lossy webp.
The total size of this dataset is around 1.3~1.4TB. Which is less than the 20% of original file size.
### Webdataset
This dataset use webdataset library to save all the tarfile, therefore, you can also use webdataset to load them easily. This is also a recommended way.
The `__key__` of each files is the id of it. You can use this id to query the [metadata database](https://huggingface.co/datasets/KBlueLeaf/danbooru2023-sqlite) easily.
# Danbooru 2023 WebP:Danbooru 2023 空间高效版
本数据集为[danbooru2023](https://huggingface.co/datasets/nyanko7/danbooru2023)的缩放重编码版本,已剔除非图像文件与损坏截断的文件,并将所有图像调整至较小尺寸。
本数据集已更新至最新ID:7,832,883,感谢DeepGHS的贡献!
**注意**:更新文件夹与deepghs/danbooru_newest-webp-4Mpixel的内容已合并至2000~2999.tar,您可安全忽略更新文件夹内的所有内容!
---
## 详情
本数据集采用多种方法缩减存储体积、提升空间利用效率。
### 尺寸与格式
本数据集将所有像素超过2048×2048的图像通过双三次插值算法(bicubic)调整至接近2048×2048的尺寸。同时剔除调整后长边超过16383的图像,原因有二:其一为WebP格式不支持该尺寸,其二为该类图像的宽高比过于极端(过大或过小)。
本数据集使用Python的Pillow库将所有图像编码为质量90%的WebP格式,其体积仅为100%质量有损WebP格式的一半。
本数据集总大小约为1.3~1.4TB,不足原始数据集体积的20%。
### WebDataset格式
本数据集使用WebDataset库存储所有Tar文件,因此您可通过WebDataset库轻松加载数据,这也是官方推荐的加载方式。
每个文件的`__key__`字段即为其对应图像的ID,您可通过该ID快速查询[元数据库](https://huggingface.co/datasets/KBlueLeaf/danbooru2023-sqlite)。
提供机构:
maas
创建时间:
2025-12-31



