robomotic/lewm-breakout-plays
收藏Hugging Face2026-04-14 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/robomotic/lewm-breakout-plays
下载链接
链接失效反馈官方服务:
资源简介:
---
title: LeWM Breakout Plays Dataset
emoji: 🏓
colorFrom: blue
colorTo: purple
sdk: static
---
# LeWM Breakout Plays Dataset
This repository contains Atari Breakout trajectories generated by various heuristic policies for the LeWorldModel (LeWM) project.
The datasets are organized by the heuristic policy that generated them, and each contains the corresponding Parquet files and statistics.
## Available Heuristics
- **center**: A policy that constantly tries to keep the paddle in the exact center of the screen. This generates data with minimal paddle variance, useful for causal inference on object presence.
- **heuristic**: An expert-level algorithmic policy that tracks the ball's X position dynamically to successfully hit it. This yields high-reward, long-episode sequences.
- **passive**: A completely inactive policy that performs no moving actions, generating sequences that show how the game evolves purely passively (without player interaction).
- **random_active**: A policy that selects actions randomly but ensures the game starts (by occasionally pressing FIRE). Generates diverse, short-lived chaotic trajectories.
- **wall_hugger**: A policy that drives the paddle to one of the walls and stays there. This creates a data distribution where the paddle's spatial location is extremely biased.
---
## 🔬 Representation Quality Evaluation
These datasets are used to train and evaluate the **LeWorldModel (LeWM)**, a JEPA-based world model for Atari Breakout. The quality of the learned representations is measured via **MLP probing** on frozen latent embeddings from the trained encoder and its causal predictor.
### Probe Procedure
After training, lightweight MLP probes (3-layer: 256→128→output) and linear probes are trained on top of **frozen** encoder/predictor representations. No gradient flows back through the JEPA model. Each probe addresses a specific question about what physics the encoder has absorbed:
| Task | Input | Target | Why |
| :--- | :--- | :--- | :--- |
| `ball_pos` | Encoder CLS at $t$ | Ball $(x,y)$ at $t$ | Does the encoder localise the ball? |
| `ball_vel` | Encoder CLS at $t$ | Ball $(v_x,v_y)$ at $t$ | Does the encoder track motion? |
| `ball_pos_pred` | ARPredictor output at $t$ | Ball $(x,y)$ at $t+1$ | Does the world model predict next-frame position? |
| `ball_vel_pred` | ARPredictor output at $t$ | Ball $(v_x,v_y)$ at $t+1$ | Does the predictor encode implicit momentum? |
| `paddle_pos` | Encoder CLS at $t$ | Paddle $x$ at $t$ | Does the encoder locate the controlled paddle? |
| `paddle_pos_act` | CLS + action at $t$ | Paddle $x$ at $t$ | How much does action conditioning improve paddle prediction? |
Baselines include: (1) a **static persistence** predictor that copies the current label as the prediction; (2) a **randomly-initialised** ViT encoder (sanity check — should score near R²≈0); (3) a **patch-mean** alternative to the CLS token; and (4) **linear probes** to measure linear separability.
### Probe Results — 100-Epoch Reference Run
**Evaluation split**: held-out `my_datasets_eval/` (≈13,225 sequences from heuristic + random_active policies, never seen during training).
#### MLP Probe — R² (test set)
| Task | ep 1 | ep 10 | ep 25 | ep 50 | ep 100 |
| :--- | :---: | :---: | :---: | :---: | :---: |
| `ball_pos` | +0.178 | **+0.306** | +0.212 | +0.234 | +0.272 |
| `ball_vel` | -0.222 | -0.314 | -0.269 | -0.296 | -0.271 |
| `ball_pos_pred` | +0.131 | +0.243 | +0.230 | +0.226 | +0.253 |
| `ball_vel_pred` | -0.144 | -0.299 | -0.260 | -0.249 | -0.270 |
| `paddle_pos` | -0.253 | -0.548 | -0.370 | -0.562 | **+0.319** |
| `paddle_pos_act` | -0.169 | -0.170 | -0.187 | -0.536 | **+0.456** |
#### Linear Probe — R² (test set)
| Task | ep 1 | ep 10 | ep 25 | ep 50 | ep 100 |
| :--- | :---: | :---: | :---: | :---: | :---: |
| `ball_pos` | -0.143 | +0.244 | +0.249 | +0.327 | +0.301 |
| `ball_vel` | -0.022 | -0.062 | -0.059 | -0.094 | -0.078 |
| `ball_pos_pred` | -0.194 | +0.198 | +0.282 | +0.278 | +0.276 |
| `ball_vel_pred` | -0.017 | -0.031 | -0.040 | -0.058 | -0.068 |
| `paddle_pos` | -0.384 | -0.173 | -0.110 | -0.009 | **+0.478** |
| `paddle_pos_act` | -0.076 | +0.077 | +0.143 | +0.183 | **+0.607** |
#### Sanity-Check Baselines
| Baseline | `ball_pos` R² | `ball_vel` R² | `paddle_pos` R² |
| :--- | :---: | :---: | :---: |
| Static copy (predict $t$ as $t+1$) | +0.955 | +0.821 | +0.778 |
| Random encoder | -0.351 | -0.018 | -0.034 |
### Key Findings
- **Ball position is learned early**: R² reaches +0.31 by epoch 10. The linear probe improves monotonically, confirming representations become more linearly decodable over the full 100 epochs.
- **Ball velocity is not encoded**: Negative R² across all epochs and probe types — JEPA training does not provide an explicit signal to retain first-order dynamics.
- **Predictor tracks ball position well**: At epoch 100, the ARPredictor R² (+0.25) is within 2% of the encoder R² (+0.27), demonstrating genuine next-frame anticipation.
- **Paddle position emerges late**: Val R² grows steadily (reaching +0.92 at ep100), but test R² is negative until epoch 100 when it jumps to +0.32/+0.48 — a distribution-shift effect from the held-out policy mix.
- **Action conditioning is effective**: Adding the action embedding boosts paddle prediction by +0.13 (linear, ep100), confirming the ARPredictor correctly absorbs action-conditioned dynamics.
- **Random encoder is uninformative**: R² ≈ −0.35 for ball_pos under a random ViT confirms all probe gains come from training, not pixel statistics.
For full methodology, all results, and reproduction instructions, see the companion model repository at [robomotic/lewm-breakout](https://huggingface.co/robomotic/lewm-breakout).
提供机构:
robomotic



