MLL-Lab/ENACT

Name: MLL-Lab/ENACT
Creator: MLL-Lab
Published: 2025-11-29 21:08:20
License: 暂无描述

Hugging Face2025-11-29 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/MLL-Lab/ENACT

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: ENACT language: - en task_categories: - visual-question-answering configs: - config_name: default data_files: - QA.zip dataset_info: features: - name: id dtype: string - name: type dtype: string - name: task_name dtype: string - name: key_frame_ids sequence: string - name: images sequence: string - name: question dtype: string - name: options sequence: string - name: gt_answer sequence: int32 license: mit tags: - agent size_categories: - 1K<n<10K --- # ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction ENACT is a benchmark dataset for evaluating **embodied cognition** in vision–language models via **egocentric world modeling**. It probes whether models can reason about how the world changes under sequences of actions, using long-horizon household activities in a mobile manipulation setting. - **Project page:** https://enact-embodied-cognition.github.io/ - **Code & evaluation:** https://github.com/mll-lab-nu/ENACT - **Paper** https://arxiv.org/abs/2511.20937 ## Dataset Summary Each ENACT example is a **multi-image, multi-step reasoning problem** built from robot trajectories: - **Forward world modeling** - Input: one **current state image**, several **future state images** (shuffled), and a list of **actions in correct order**. - Task: output a Python list of integers giving the **correct chronological order of future images** (e.g., `[1, 3, 2]`). - **Inverse world modeling** - Input: an **ordered sequence of images** showing state changes, plus **actions in shuffled order**. - Task: output a Python list of integers giving the **correct chronological order of actions** (e.g., `[2, 3, 1]`). All images are egocentric RGB observations rendered from long-horizon household tasks (e.g., assembling gift baskets, bringing water, preparing lunch boxes, cleaning up a desk). ## File Structure After unpacking, the dataset has the following structure: ```text . ├── enact_ordering.jsonl # All QA examples (one JSON per line) └── images/ ├── forward_world_modeling_3_steps/ ├── forward_world_modeling_4_steps/ ├── ... ├── forward_world_modeling_10_steps/ ├── inverse_world_modeling_3_steps/ ├── ... └── inverse_world_modeling_10_steps/ ```` Each task folder (e.g., `forward_world_modeling_3_steps/`) contains one subfolder per activity, such as: ```text images/forward_world_modeling_3_steps/ ├── assembling_gift_baskets_1749468508582193/ ├── bringing_water_1750844141719178/ ├── ... ``` Inside each activity folder are the PNGs for that trajectory (current state and future states, or ordered states in the inverse setting). ## JSONL Format Each line in `enact_ordering.jsonl` is a JSON object: ```json { "id": "assembling_gift_baskets_1749468508582193_forward_world_modeling_3_steps_cfbcc15c", "type": "forward_world_modeling_3_steps", "task_name": "assembling_gift_baskets_1749468508582193", "key_frame_ids": ["4150", "11360", "11834"], "images": [ "QA/images/forward_world_modeling_3_steps/..._cur_state.png", "QA/images/forward_world_modeling_3_steps/..._next_state_1.png", "QA/images/forward_world_modeling_3_steps/..._next_state_2.png" ], "question": "...natural language instructions and actions...", "options": [], "gt_answer": [1, 2] } ``` * **`id`** – unique identifier for this QA instance. * **`type`** – question type and horizon, e.g. `forward_world_modeling_3_steps` or `inverse_world_modeling_4_steps`. * **`task_name`** – underlying household task instance. * **`key_frame_ids`** – frame indices of selected key frames in the trajectory. * **`images`** – relative paths to PNG images: * index 0 is the **current state**; * subsequent entries are **future states** (forward) or later states (inverse). * **`question`** – natural language prompt specifying the task setup, actions, and the required output as a Python list of integers. * **`gt_answer`** – ground-truth ordering of image or action labels (list of integers, e.g. `[1, 3, 2]`). ## Usage To evaluate, follow the scripts in the code repository: [https://github.com/mll-lab-nu/ENACT](https://github.com/mll-lab-nu/ENACT) ## Citation If you use ENACT, please cite the paper: ``` @article{wang2025enact, title={ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction}, author={Wang, Qineng and Huang, Wenlong and Zhou, Yu and Yin, Hang and Bao, Tianwei and Lyu, Jianwen and Liu, Weiyu and Zhang, Ruohan and Wu, Jiajun and Li, Fei-Fei and Li, Manling}, journal={arXiv preprint arXiv:2511.20937}, year={2025} } ```

提供机构：

MLL-Lab

5,000+

优质数据集

54 个

任务类型

进入经典数据集