five

akrao9/512t2ilatent

收藏
Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/akrao9/512t2ilatent
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Fine-T2I 512 Latent Cache license: apache-2.0 language: - en size_categories: - 1M<n<10M source_datasets: - extended task_categories: - text-to-image tags: - image - text - image-generation - t2i - webdataset - diffusion - latent-cache - dc-ae - t5 --- # Fine-T2I 512 Latent Cache This dataset is a precomputed latent cache built from the `synthetic_enhanced_prompt_random_resolution` subset of [`ma-xu/fine-t2i`](https://huggingface.co/datasets/ma-xu/fine-t2i). It is intended for faster text-to-image training by avoiding repeated image encoding and text encoding during training. ## What This Repo Contains Each WebDataset sample contains: - `latents.npy`: `float16` array with shape `[32, 16, 16]` - `text.npy`: `float16` array with shape `[384, 768]` - `text_mask.npy`: `uint8` array with shape `[384]` - `caption.txt`: UTF-8 caption text - `meta.json`: metadata including subset, sample index, and aesthetic score Additional files in the repo: - `manifest.json`: cache metadata and preprocessing settings - `null_text.npy`: unconditional T5 embedding for empty prompt - `null_mask.npy`: unconditional T5 attention mask ## Preprocessing Image preprocessing used for this cache: - resize to `512` - center crop to `512 x 512` - encode with `mit-han-lab/dc-ae-f32c32-sana-1.1-diffusers` Text preprocessing used for this cache: - encoder: `google-t5/t5-base` - max sequence length: `384` - hidden size: `768` ## Dataset Stats - subset: `synthetic_enhanced_prompt_random_resolution` - samples written: `1,611,313` - image latent shape: `[32, 16, 16]` - text embedding shape: `[384, 768]` ## Intended Use This dataset is intended for: - training or fine-tuning text-to-image models from cached latents - faster experimentation than raw image streaming - classifier-free guidance training using cached null text embeddings This dataset is not intended to replace the original source dataset for tasks that require raw images or different crop / resize policies. ## Load Example ### Streaming with `datasets` ```python import torch import numpy as np import matplotlib.pyplot as plt import webdataset as wds from huggingface_hub import HfFileSystem, get_token, hf_hub_url from diffusers import AutoencoderDC from diffusers.image_processor import VaeImageProcessor # 1. Setup WebDataset with Hub Authentication fs = HfFileSystem() # Glob all shards from the repo files = [fs.resolve_path(path) for path in fs.glob("hf://datasets/akrao9/512t2ilatent/**/train-*.tar")] urls = [hf_hub_url(f.repo_id, f.path_in_repo, repo_type="dataset") for f in files] # Construct the pipe command for each shard # We use 'pipe:curl' to inject the HF token into the request header wds_urls = [f"pipe:curl -s -L -H 'Authorization:Bearer {get_token()}' {url}" for url in urls] # Create the dataset pipeline dataset = ( wds.WebDataset(wds_urls) .shuffle(100) # Optional: shuffle buffer .decode() # Decodes bytes into numpy/PIL based on extension ) # Grab a single sample sample = next(iter(dataset)) # 2. Setup Device and Processor device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model_dtype = torch.bfloat16 if device.type == "cuda" else torch.float32 processor = VaeImageProcessor() # 3. Load DC-AE (Sana-1.1) dc_ae = AutoencoderDC.from_pretrained( "mit-han-lab/dc-ae-f32c32-sana-1.1-diffusers", torch_dtype=model_dtype, ).to(device).eval() # 4. Prepare Latents # WebDataset .decode() automatically handles .npy files as numpy arrays latents_np = sample["latents.npy"] latents = torch.from_numpy(latents_np).unsqueeze(0).to(device, dtype=model_dtype).contiguous() # 5. Inference & Post-Processing with torch.inference_mode(): # Scale latents and decode raw_output = dc_ae.decode(latents / dc_ae.config.scaling_factor).sample image = processor.postprocess(raw_output, output_type="np")[0] # 6. Display plt.figure(figsize=(6, 6)) plt.imshow(image) plt.axis("off") # Handle potential missing captions or formatting caption = sample.get("caption.txt", "No Caption Found") plt.title(caption[:100] + "..." if len(caption) > 100 else caption) plt.show()
提供机构:
akrao9
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作