five

Birchlabs/sdxl-latents-artbench

收藏
Hugging Face2023-11-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Birchlabs/sdxl-latents-artbench
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为ArtBench,通过Ollin VAE将ArtBench样本编码为float16 SDXL latents。数据集使用特定脚本创建,未保存均值和logvar,因为方差足够低,不值得为了保留而增加文件大小。数据集包含原始图像和从对角高斯分布采样的潜在表示。数据集的来源包括WikiArt、Ukiyo-e.org数据库和The Surrealism Website,并遵循Fair Use许可。数据集包含训练集和测试集,分别有50000和10000个样本。

This dataset is named ArtBench. ArtBench samples are encoded into float16 SDXL latents via Ollin VAE. The dataset was constructed using a custom script, and the mean and logvar values were not saved, as the variance was sufficiently low to not justify the increased file size required to preserve them. The dataset includes both original images and latent representations sampled from a diagonal Gaussian distribution. Its source materials are drawn from WikiArt, the Ukiyo-e.org database, and The Surrealism Website, and it is released under the Fair Use license. The dataset comprises a training set with 50,000 samples and a test set with 10,000 samples.
提供机构:
Birchlabs
原始信息汇总

ArtBench 数据集概述

数据集创建

  • 数据集通过将 ArtBench 样本编码为 float16 SDXL 潜在变量创建,使用 Ollin VAE 进行编码。
  • 数据集创建脚本:make_sdxl_latent_dataset.py

数据集内容

  • 未保存均值和方差,因为方差足够低,不值得为了保留它们而使文件大小翻倍。
  • 从对角高斯分布中采样,保存了结果的潜在变量。
  • 保留了原始图像。

数据集结构

python from typing import TypedDict, Iterator from webdataset import WebDataset Sample = TypedDict(Sample, { key: str, url: str, cls.txt: bytes, # UTF-8 编码的类别 ID,范围从 0 到 9 img.png: bytes, # 序列化的 PIL 图像,256256 像素 latent.pth: bytes, # 序列化的 FloatTensor,3232 潜在变量 })

it: Iterator[Sample] = WebDataset(train/{00000..00004}.tar)

for sample in it: pass

数据集大小

  • 训练集:50000 样本
  • 测试集:10000 样本

统计信息

测试集

  • 均值 (test/avg/val.pt): python [-0.11362826824188232, -0.7059057950973511, 0.4819808006286621, 2.2327630519866943]

  • 平方和 (test/avg/sq.pt): python [52.59075927734375, 30.115631103515625, 44.977020263671875, 30.228885650634766]

  • 标准差 (std): python [7.251058578491211, 5.442180633544922, 6.689148902893066, 5.024306297302246]

  • 标准差的倒数 (1/std): python [0.1379109025001526, 0.18374986946582794, 0.14949584007263184, 0.19903245568275452]

训练集

  • 均值 (train/avg/val.pt): python [-0.1536690890789032, -0.7142514586448669, 0.4706766605377197, 2.24863600730896]

  • 平方和 (train/avg/sq.pt): python [51.99677276611328, 30.184646606445312, 44.909732818603516, 30.234216690063477]

  • 标准差 (std): python [7.2092413902282715, 5.447429656982422, 6.68492317199707, 5.017753601074219]

  • 标准差的倒数 (1/std): python [0.1387108564376831, 0.18357281386852264, 0.14959034323692322, 0.1992923617362976]

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作