akrao9/512t2ilatent
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/akrao9/512t2ilatent
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Fine-T2I 512 Latent Cache
license: apache-2.0
language:
- en
size_categories:
- 1M<n<10M
source_datasets:
- extended
task_categories:
- text-to-image
tags:
- image
- text
- image-generation
- t2i
- webdataset
- diffusion
- latent-cache
- dc-ae
- t5
---
# Fine-T2I 512 Latent Cache
This dataset is a precomputed latent cache built from the `synthetic_enhanced_prompt_random_resolution` subset of [`ma-xu/fine-t2i`](https://huggingface.co/datasets/ma-xu/fine-t2i).
It is intended for faster text-to-image training by avoiding repeated image encoding and text encoding during training.
## What This Repo Contains
Each WebDataset sample contains:
- `latents.npy`: `float16` array with shape `[32, 16, 16]`
- `text.npy`: `float16` array with shape `[384, 768]`
- `text_mask.npy`: `uint8` array with shape `[384]`
- `caption.txt`: UTF-8 caption text
- `meta.json`: metadata including subset, sample index, and aesthetic score
Additional files in the repo:
- `manifest.json`: cache metadata and preprocessing settings
- `null_text.npy`: unconditional T5 embedding for empty prompt
- `null_mask.npy`: unconditional T5 attention mask
## Preprocessing
Image preprocessing used for this cache:
- resize to `512`
- center crop to `512 x 512`
- encode with `mit-han-lab/dc-ae-f32c32-sana-1.1-diffusers`
Text preprocessing used for this cache:
- encoder: `google-t5/t5-base`
- max sequence length: `384`
- hidden size: `768`
## Dataset Stats
- subset: `synthetic_enhanced_prompt_random_resolution`
- samples written: `1,611,313`
- image latent shape: `[32, 16, 16]`
- text embedding shape: `[384, 768]`
## Intended Use
This dataset is intended for:
- training or fine-tuning text-to-image models from cached latents
- faster experimentation than raw image streaming
- classifier-free guidance training using cached null text embeddings
This dataset is not intended to replace the original source dataset for tasks that require raw images or different crop / resize policies.
## Load Example
### Streaming with `datasets`
```python
import torch
import numpy as np
import matplotlib.pyplot as plt
import webdataset as wds
from huggingface_hub import HfFileSystem, get_token, hf_hub_url
from diffusers import AutoencoderDC
from diffusers.image_processor import VaeImageProcessor
# 1. Setup WebDataset with Hub Authentication
fs = HfFileSystem()
# Glob all shards from the repo
files = [fs.resolve_path(path) for path in fs.glob("hf://datasets/akrao9/512t2ilatent/**/train-*.tar")]
urls = [hf_hub_url(f.repo_id, f.path_in_repo, repo_type="dataset") for f in files]
# Construct the pipe command for each shard
# We use 'pipe:curl' to inject the HF token into the request header
wds_urls = [f"pipe:curl -s -L -H 'Authorization:Bearer {get_token()}' {url}" for url in urls]
# Create the dataset pipeline
dataset = (
wds.WebDataset(wds_urls)
.shuffle(100) # Optional: shuffle buffer
.decode() # Decodes bytes into numpy/PIL based on extension
)
# Grab a single sample
sample = next(iter(dataset))
# 2. Setup Device and Processor
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_dtype = torch.bfloat16 if device.type == "cuda" else torch.float32
processor = VaeImageProcessor()
# 3. Load DC-AE (Sana-1.1)
dc_ae = AutoencoderDC.from_pretrained(
"mit-han-lab/dc-ae-f32c32-sana-1.1-diffusers",
torch_dtype=model_dtype,
).to(device).eval()
# 4. Prepare Latents
# WebDataset .decode() automatically handles .npy files as numpy arrays
latents_np = sample["latents.npy"]
latents = torch.from_numpy(latents_np).unsqueeze(0).to(device, dtype=model_dtype).contiguous()
# 5. Inference & Post-Processing
with torch.inference_mode():
# Scale latents and decode
raw_output = dc_ae.decode(latents / dc_ae.config.scaling_factor).sample
image = processor.postprocess(raw_output, output_type="np")[0]
# 6. Display
plt.figure(figsize=(6, 6))
plt.imshow(image)
plt.axis("off")
# Handle potential missing captions or formatting
caption = sample.get("caption.txt", "No Caption Found")
plt.title(caption[:100] + "..." if len(caption) > 100 else caption)
plt.show()
提供机构:
akrao9



