yuanchenyang/imagenet-256-sd-vae-ft-mse-latents

Name: yuanchenyang/imagenet-256-sd-vae-ft-mse-latents
Creator: yuanchenyang
Published: 2026-03-26 15:59:39
License: 暂无描述

Hugging Face2026-03-26 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/yuanchenyang/imagenet-256-sd-vae-ft-mse-latents

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: other license_name: imagenet license_link: https://image-net.org/download task_categories: - image-classification tags: - imagenet - latent-diffusion - vae - sd-vae-ft-mse size_categories: - 1M<n<10M --- # ImageNet-256 SD-VAE-ft-MSE Latents Pre-computed **posterior means** (no variance/std) from the [Stable Diffusion VAE (`stabilityai/sd-vae-ft-mse`)](https://huggingface.co/stabilityai/sd-vae-ft-mse) for the full ImageNet-1K training set at 256×256 resolution, stored as Parquet shards. Each example includes latents for both the **original and horizontally flipped** image, enabling flip augmentation without re-encoding at training time. ## Dataset Description Each example contains: | Column | Shape | Type | Description | |---|---|---|---| | `latent_mean` | `(4, 32, 32)` | `float32` | Posterior mean of the original image | | `latent_mean_flip` | `(4, 32, 32)` | `float32` | Posterior mean of the horizontally flipped image | | `label` | scalar | `int64` | ImageNet class label (0–999) | - **Number of examples**: 1,281,167 (full ImageNet-1K train split) - **Latent spatial size**: 32×32 (8× downsampled from 256×256 pixels) - **Latent channels**: 4 ## Creation Images were center-cropped and resized to 256×256 using the [Dhariwal (ADM) cropping method](https://github.com/openai/guided-diffusion), then encoded with `stabilityai/sd-vae-ft-mse`. Only the posterior **mean** is stored (not variance), for both the original and horizontally flipped image. Pixels are normalised to `[-1, 1]` before encoding, consistent with DiT / SiT conventions. ## Usage ```python from datasets import load_dataset ds = load_dataset("yuanchenyang/imagenet-256-sd-vae-ft-mse-latesnts", split="train") example = ds[0] latent = example["latent_mean"] # numpy array (4, 32, 32) latent_flip = example["latent_mean_flip"] # numpy array (4, 32, 32) label = example["label"] # int ``` ## Intended Use Training latent diffusion models (e.g., DiT, SiT, SR-DiT) on ImageNet-256 without needing to run the VAE encoder during training. ## Source Code Preprocessing scripts are based on: <https://github.com/Martinser/REG/tree/main/preprocessing> ## License The latent representations inherit the [ImageNet license terms](https://image-net.org/download). The VAE weights are from Stability AI's `sd-vae-ft-mse` (CreativeML Open RAIL-M license). The preprocessing code is licensed under the MIT license.

提供机构：

yuanchenyang

5,000+

优质数据集

54 个

任务类型

进入经典数据集