yuanchenyang/imagenet-256-sd-vae-ft-mse-latents
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/yuanchenyang/imagenet-256-sd-vae-ft-mse-latents
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: imagenet
license_link: https://image-net.org/download
task_categories:
- image-classification
tags:
- imagenet
- latent-diffusion
- vae
- sd-vae-ft-mse
size_categories:
- 1M<n<10M
---
# ImageNet-256 SD-VAE-ft-MSE Latents
Pre-computed **posterior means** (no variance/std) from the [Stable Diffusion VAE (`stabilityai/sd-vae-ft-mse`)](https://huggingface.co/stabilityai/sd-vae-ft-mse) for the full ImageNet-1K training set at 256×256 resolution, stored as Parquet shards. Each example includes latents for both the **original and horizontally flipped** image, enabling flip augmentation without re-encoding at training time.
## Dataset Description
Each example contains:
| Column | Shape | Type | Description |
|---|---|---|---|
| `latent_mean` | `(4, 32, 32)` | `float32` | Posterior mean of the original image |
| `latent_mean_flip` | `(4, 32, 32)` | `float32` | Posterior mean of the horizontally flipped image |
| `label` | scalar | `int64` | ImageNet class label (0–999) |
- **Number of examples**: 1,281,167 (full ImageNet-1K train split)
- **Latent spatial size**: 32×32 (8× downsampled from 256×256 pixels)
- **Latent channels**: 4
## Creation
Images were center-cropped and resized to 256×256 using the [Dhariwal (ADM) cropping method](https://github.com/openai/guided-diffusion), then encoded with `stabilityai/sd-vae-ft-mse`. Only the posterior **mean** is stored (not variance), for both the original and horizontally flipped image. Pixels are normalised to `[-1, 1]` before encoding, consistent with DiT / SiT conventions.
## Usage
```python
from datasets import load_dataset
ds = load_dataset("yuanchenyang/imagenet-256-sd-vae-ft-mse-latesnts", split="train")
example = ds[0]
latent = example["latent_mean"] # numpy array (4, 32, 32)
latent_flip = example["latent_mean_flip"] # numpy array (4, 32, 32)
label = example["label"] # int
```
## Intended Use
Training latent diffusion models (e.g., DiT, SiT, SR-DiT) on ImageNet-256 without needing to run the VAE encoder during training.
## Source Code
Preprocessing scripts are based on: <https://github.com/Martinser/REG/tree/main/preprocessing>
## License
The latent representations inherit the [ImageNet license terms](https://image-net.org/download). The VAE weights are from Stability AI's `sd-vae-ft-mse` (CreativeML Open RAIL-M license). The preprocessing code is licensed under the MIT license.
提供机构:
yuanchenyang



