five

danbooru2023-webp-4Mpixel

收藏
魔搭社区2026-05-11 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/KBlueLeaf/danbooru2023-webp-4Mpixel
下载链接
链接失效反馈
官方服务:
资源简介:
# Danbooru 2023 webp: A space-efficient version of Danbooru 2023 This dataset is a resized/re-encoded version of [danbooru2023](https://huggingface.co/datasets/nyanko7/danbooru2023).<br> Which removed the non-image/truncated files and resize all of them into smaller size. This dataset already be updated to latest_id = 7,832,883. Thx to DeepGHS! **Notice**: content of updates folder and deepghs/danbooru_newest-webp-4Mpixel have been merged to 2000~2999.tar, You can ignore all the content in updates folder safely! --- ## Details This dataset employs few method to reduce the size and improve the efficiency. ### Size and Format This dataset resize all the image which have more than 2048x2048 pixel into near 2048x2048 pixels with bicubic algorithm.<br> And remove all the image with longer edge larger than 16383 after resize.<br> (one reason is beacuse webp doesn't allow that, another is that aspect ratio is too large/small.) This dataset encode/save all the image with 90% quality webp with pillow library in Python. Which is half size of the 100% quality lossy webp. The total size of this dataset is around 1.3~1.4TB. Which is less than the 20% of original file size. ### Webdataset This dataset use webdataset library to save all the tarfile, therefore, you can also use webdataset to load them easily. This is also a recommended way. The `__key__` of each files is the id of it. You can use this id to query the [metadata database](https://huggingface.co/datasets/KBlueLeaf/danbooru2023-sqlite) easily.

# Danbooru 2023 WebP:Danbooru 2023 空间高效版 本数据集为[danbooru2023](https://huggingface.co/datasets/nyanko7/danbooru2023)的缩放重编码版本,已剔除非图像文件与损坏截断的文件,并将所有图像调整至较小尺寸。 本数据集已更新至最新ID:7,832,883,感谢DeepGHS的贡献! **注意**:更新文件夹与deepghs/danbooru_newest-webp-4Mpixel的内容已合并至2000~2999.tar,您可安全忽略更新文件夹内的所有内容! --- ## 详情 本数据集采用多种方法缩减存储体积、提升空间利用效率。 ### 尺寸与格式 本数据集将所有像素超过2048×2048的图像通过双三次插值算法(bicubic)调整至接近2048×2048的尺寸。同时剔除调整后长边超过16383的图像,原因有二:其一为WebP格式不支持该尺寸,其二为该类图像的宽高比过于极端(过大或过小)。 本数据集使用Python的Pillow库将所有图像编码为质量90%的WebP格式,其体积仅为100%质量有损WebP格式的一半。 本数据集总大小约为1.3~1.4TB,不足原始数据集体积的20%。 ### WebDataset格式 本数据集使用WebDataset库存储所有Tar文件,因此您可通过WebDataset库轻松加载数据,这也是官方推荐的加载方式。 每个文件的`__key__`字段即为其对应图像的ID,您可通过该ID快速查询[元数据库](https://huggingface.co/datasets/KBlueLeaf/danbooru2023-sqlite)。
提供机构:
maas
创建时间:
2025-12-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作