chrisxx/minecraft-latents

Name: chrisxx/minecraft-latents
Creator: chrisxx
Published: 2026-04-09 14:52:03
License: 暂无描述

Hugging Face2026-04-09 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/chrisxx/minecraft-latents

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: other task_categories: - video-classification tags: - minecraft - video - world-model - diffusion - latents pretty_name: GF-Minecraft DC-AE-lite-f32c32 latents (data_2003) size_categories: - 10B<n<100B --- # GF-Minecraft DC-AE-lite-f32c32 latents (`data_2003`) Pre-encoded latent shards of [KlingTeam/GameFactory-Dataset](https://huggingface.co/datasets/KlingTeam/GameFactory-Dataset)'s `GF-Minecraft/data_2003` split (mouse + keyboard captures), ready to stream directly into the diffusion world-model trainer in [wendlerc/toy-wm-private](https://github.com/wendlerc/toy-wm-private) on branch `feat/minecraft`. **Skip the encode step** — the raw dataset is ~138 GB and encoding takes several hours on 4 A6000/A100 GPUs. These shards let you jump straight into training. ## What's in here - **32 WebDataset tar shards**, `latent-*.tar`, ~2 GB each, ~53 GB total. - **~2 k clips** of 2000 frames each (≈70 hours of gameplay, GF-Minecraft `data_2003` subset). - Each tar sample is one clip: ``` sample.__key__ = "<seed>_part_<i>" sample.latents.npy = (~1999, 32, 11, 20) fp16 # DC-AE-lite-f32c32 latents sample.actions.npy = (~1999, 14) fp32 # Minecraft1P actions sample.meta.json = {biome, initial_weather, start_time, source_video, fps} ``` ### Encoding details - **VAE:** [`mit-han-lab/dc-ae-lite-f32c32-sana-1.1-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-lite-f32c32-sana-1.1-diffusers). 32× spatial, 32 channel, lite/distilled variant (same VAE the doom shards in this project use). - **Frame preprocessing:** 640×360 raw → center-cropped to 640×352 → encoded → `(32, 11, 20)` latent per frame. Stored as fp16 to halve disk. - **fps:** 16 (from the source clips). - **Source:** `GF-Minecraft/data_2003` only — every clip in that subset where the mouse annotations are present. Encoding script: [`scripts/encode_gamefactory.py`](https://github.com/wendlerc/toy-wm-private/blob/feat/minecraft/scripts/encode_gamefactory.py). ### 14-dim action layout Matches `Minecraft1P` in [`src/models/actionembs.py`](https://github.com/wendlerc/toy-wm-private/blob/feat/minecraft/src/models/actionembs.py). The loader prepends a 15th uncond flag at load time. | dims | meaning | |---|---| | `[0:3]` | one-hot ws (noop / forward / back) | | `[3:6]` | one-hot ad (noop / left / right) | | `[6:10]` | one-hot scs (noop / jump / sneak / sprint) | | `[10]` | `pitch_delta` (degrees; raw × 15) | | `[11]` | `yaw_delta` (degrees) | | `[12:14]` | reserved zero | ## Quickstart — training on a new node ```bash # 1. Clone the trainer and check out the minecraft branch git clone git@github.com:wendlerc/toy-wm-private.git toy-wm-minecraft cd toy-wm-minecraft git checkout feat/minecraft # 2. Install deps uv sync # 3. Download the latents (53 GB, ~15-30 min on a fast link) uv run scripts/download_minecraft_latents.py --dst scratch/gf_latents # 4. Launch training (adjust --nproc_per_node to your GPU count) CUDA_VISIBLE_DEVICES=0,1,2,3 \ uv run torchrun --standalone --nproc_per_node=4 -m src.main \ --config configs/minecraft_dit_a100.yaml ``` The default config (`configs/minecraft_dit_a100.yaml`) trains a 40M-parameter DiT for 20 k steps, logging to the `minecraft1p-wm` W&B project. It expects the shards at `scratch/gf_latents/` inside the repo root — the download script defaults to that path. ## Loading the shards without the trainer ```python import webdataset as wds, io, numpy as np def decode(sample): for k in ("latents.npy", "actions.npy"): sample[k] = np.load(io.BytesIO(sample[k])) return sample ds = ( wds.WebDataset("path/to/latent-*.tar", shardshuffle=True) .map(decode) ) for sample in ds: print(sample["__key__"], sample["latents.npy"].shape, sample["actions.npy"].shape) break ``` ## Credits This is a derivative of **GameFactory-Dataset** by the Kling team. Please cite their work if you use these latents: - Dataset: [KlingTeam/GameFactory-Dataset](https://huggingface.co/datasets/KlingTeam/GameFactory-Dataset) - Paper: *GameFactory: Creating New Games with Generative Interactive Videos* ([arXiv:2501.08325](https://arxiv.org/abs/2501.08325)) The VAE is [MIT HAN Lab](https://huggingface.co/mit-han-lab)'s DC-AE-lite-f32c32-sana-1.1. The encoding pipeline and trainer are from [wendlerc/toy-wm-private](https://github.com/wendlerc/toy-wm-private). **License:** inherits from the upstream GF-Minecraft dataset. Check [KlingTeam/GameFactory-Dataset](https://huggingface.co/datasets/KlingTeam/GameFactory-Dataset) for the authoritative terms.

提供机构：

chrisxx

5,000+

优质数据集

54 个

任务类型

进入经典数据集