TESS-Computer/atari-vla-stage1-15hz

Name: TESS-Computer/atari-vla-stage1-15hz
Creator: TESS-Computer
Published: 2025-12-10 05:58:32
License: 暂无描述

Hugging Face2025-12-10 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/TESS-Computer/atari-vla-stage1-15hz

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: string - name: game dtype: string - name: trial_id dtype: int32 - name: episode_id dtype: int32 - name: chunk_idx dtype: int32 - name: frame_start dtype: int32 - name: action dtype: string - name: action_ints dtype: string - name: score dtype: int32 - name: reward_sum dtype: int32 - name: gaze_positions dtype: string - name: image_bytes dtype: binary license: mit task_categories: - robotics - reinforcement-learning tags: - atari - vla - vision-language-action - imitation-learning - human-demonstrations - action-chunking size_categories: - 1M<n<10M --- # TESS-Atari Stage 1 (15Hz) Human gameplay demonstrations from Atari games with **action chunking**, formatted for Vision-Language-Action (VLA) model training. ## Overview | Metric | Value | |--------|-------| | Source | [Atari-HEAD](https://zenodo.org/records/3451402) | | Games | 11 (overlapping with DIAMOND benchmark) | | Samples | ~1.3M | | Observation Rate | 5 Hz | | Action Rate | 15 Hz (3 actions per observation) | | Format | Lumine-style action tokens | ## Why Action Chunking? VLA models run at ~5 Hz inference speed, but Atari runs at 15 Hz (with frame_skip=4). Action chunking predicts 3 actions at once, matching the game's effective action rate while accommodating slower model inference. ``` Observation (5 Hz) → VLA → 3 Actions (executed at 15 Hz) ``` ## Games Included Alien, Asterix, BankHeist, Breakout, DemonAttack, Freeway, Frostbite, Hero, MsPacman, RoadRunner, Seaquest ## Action Format ``` <|action_start|> RIGHT ; RIGHT ; FIRE <|action_end|> <|action_start|> LEFT ; LEFT ; LEFT <|action_end|> <|action_start|> NOOP ; UP ; UPFIRE <|action_end|> ``` ## Schema | Field | Type | Description | |-------|------|-------------| | `id` | string | Unique sample ID: `{game}_{trial}_{chunk}` | | `game` | string | Game name (lowercase) | | `trial_id` | int | Human player trial number | | `episode_id` | int | Episode within trial (-1 if unknown) | | `chunk_idx` | int | Chunk sequence number | | `frame_start` | int | First frame index of this chunk | | `action` | string | Lumine-style chunked action token | | `action_ints` | string | Raw ALE codes comma-separated: "4,4,1" | | `score` | int | Score at chunk start | | `reward_sum` | int | Total reward over 3 frames | | `gaze_positions` | string | Eye tracking from first frame | | `image_bytes` | bytes | PNG of first frame in chunk | ## Usage ```python from datasets import load_dataset ds = load_dataset("TESS-Computer/atari-vla-stage1-15hz") # Get a sample sample = ds["train"][0] print(sample["action"]) # <|action_start|> RIGHT ; RIGHT ; FIRE <|action_end|> # Parse individual actions actions = sample["action_ints"].split(",") # ["4", "4", "1"] # Decode image from PIL import Image from io import BytesIO img = Image.open(BytesIO(sample["image_bytes"])) ``` ## Evaluation Use with [DIAMOND](https://diamond-wm.github.io/) world models (frame_skip=4). Execute the 3 predicted actions sequentially at each observation step. ## Related - [5Hz variant](https://huggingface.co/datasets/TESS-Computer/atari-vla-stage1-5hz) - Single action per observation (simpler but slower) - [Lumine AI](https://www.lumine-ai.org/) - Inspiration for VLA architecture and action chunking - [DIAMOND](https://diamond-wm.github.io/) - World model for evaluation ## Citation ```bibtex @misc{atarihead2019, title={Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset}, author={Zhang, Ruohan and others}, year={2019}, url={https://zenodo.org/records/3451402} } ```

提供机构：

TESS-Computer

5,000+

优质数据集

54 个

任务类型

进入经典数据集