chrisxx/minecraft-latents
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/chrisxx/minecraft-latents
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
task_categories:
- video-classification
tags:
- minecraft
- video
- world-model
- diffusion
- latents
pretty_name: GF-Minecraft DC-AE-lite-f32c32 latents (data_2003)
size_categories:
- 10B<n<100B
---
# GF-Minecraft DC-AE-lite-f32c32 latents (`data_2003`)
Pre-encoded latent shards of [KlingTeam/GameFactory-Dataset](https://huggingface.co/datasets/KlingTeam/GameFactory-Dataset)'s `GF-Minecraft/data_2003` split (mouse + keyboard captures), ready to stream directly into the diffusion world-model trainer in [wendlerc/toy-wm-private](https://github.com/wendlerc/toy-wm-private) on branch `feat/minecraft`.
**Skip the encode step** — the raw dataset is ~138 GB and encoding takes several hours on 4 A6000/A100 GPUs. These shards let you jump straight into training.
## What's in here
- **32 WebDataset tar shards**, `latent-*.tar`, ~2 GB each, ~53 GB total.
- **~2 k clips** of 2000 frames each (≈70 hours of gameplay, GF-Minecraft `data_2003` subset).
- Each tar sample is one clip:
```
sample.__key__ = "<seed>_part_<i>"
sample.latents.npy = (~1999, 32, 11, 20) fp16 # DC-AE-lite-f32c32 latents
sample.actions.npy = (~1999, 14) fp32 # Minecraft1P actions
sample.meta.json = {biome, initial_weather, start_time, source_video, fps}
```
### Encoding details
- **VAE:** [`mit-han-lab/dc-ae-lite-f32c32-sana-1.1-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-lite-f32c32-sana-1.1-diffusers). 32× spatial, 32 channel, lite/distilled variant (same VAE the doom shards in this project use).
- **Frame preprocessing:** 640×360 raw → center-cropped to 640×352 → encoded → `(32, 11, 20)` latent per frame. Stored as fp16 to halve disk.
- **fps:** 16 (from the source clips).
- **Source:** `GF-Minecraft/data_2003` only — every clip in that subset where the mouse annotations are present.
Encoding script: [`scripts/encode_gamefactory.py`](https://github.com/wendlerc/toy-wm-private/blob/feat/minecraft/scripts/encode_gamefactory.py).
### 14-dim action layout
Matches `Minecraft1P` in [`src/models/actionembs.py`](https://github.com/wendlerc/toy-wm-private/blob/feat/minecraft/src/models/actionembs.py). The loader prepends a 15th uncond flag at load time.
| dims | meaning |
|---|---|
| `[0:3]` | one-hot ws (noop / forward / back) |
| `[3:6]` | one-hot ad (noop / left / right) |
| `[6:10]` | one-hot scs (noop / jump / sneak / sprint) |
| `[10]` | `pitch_delta` (degrees; raw × 15) |
| `[11]` | `yaw_delta` (degrees) |
| `[12:14]` | reserved zero |
## Quickstart — training on a new node
```bash
# 1. Clone the trainer and check out the minecraft branch
git clone git@github.com:wendlerc/toy-wm-private.git toy-wm-minecraft
cd toy-wm-minecraft
git checkout feat/minecraft
# 2. Install deps
uv sync
# 3. Download the latents (53 GB, ~15-30 min on a fast link)
uv run scripts/download_minecraft_latents.py --dst scratch/gf_latents
# 4. Launch training (adjust --nproc_per_node to your GPU count)
CUDA_VISIBLE_DEVICES=0,1,2,3 \
uv run torchrun --standalone --nproc_per_node=4 -m src.main \
--config configs/minecraft_dit_a100.yaml
```
The default config (`configs/minecraft_dit_a100.yaml`) trains a 40M-parameter DiT for 20 k steps, logging to the `minecraft1p-wm` W&B project. It expects the shards at `scratch/gf_latents/` inside the repo root — the download script defaults to that path.
## Loading the shards without the trainer
```python
import webdataset as wds, io, numpy as np
def decode(sample):
for k in ("latents.npy", "actions.npy"):
sample[k] = np.load(io.BytesIO(sample[k]))
return sample
ds = (
wds.WebDataset("path/to/latent-*.tar", shardshuffle=True)
.map(decode)
)
for sample in ds:
print(sample["__key__"], sample["latents.npy"].shape, sample["actions.npy"].shape)
break
```
## Credits
This is a derivative of **GameFactory-Dataset** by the Kling team. Please cite their work if you use these latents:
- Dataset: [KlingTeam/GameFactory-Dataset](https://huggingface.co/datasets/KlingTeam/GameFactory-Dataset)
- Paper: *GameFactory: Creating New Games with Generative Interactive Videos* ([arXiv:2501.08325](https://arxiv.org/abs/2501.08325))
The VAE is [MIT HAN Lab](https://huggingface.co/mit-han-lab)'s DC-AE-lite-f32c32-sana-1.1. The encoding pipeline and trainer are from [wendlerc/toy-wm-private](https://github.com/wendlerc/toy-wm-private).
**License:** inherits from the upstream GF-Minecraft dataset. Check [KlingTeam/GameFactory-Dataset](https://huggingface.co/datasets/KlingTeam/GameFactory-Dataset) for the authoritative terms.
提供机构:
chrisxx



