EmbodiedCity/EmbodiedNav-Bench
收藏Hugging Face2026-04-25 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/EmbodiedCity/EmbodiedNav-Bench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
pretty_name: EmbodiedNav-Bench
language:
- en
task_categories:
- visual-question-answering
- reinforcement-learning
tags:
- embodied-ai
- embodied-navigation
- urban-airspace
- drone-navigation
- multimodal-reasoning
- spatial-reasoning
size_categories:
- 1K<n<10K
configs:
- config_name: default
data_files:
- split: test
path: viewer-00000-of-00001.parquet
---
# EmbodiedNav-Bench
[](https://github.com/serenditipy-AC/Embodied-Navigation-Bench)
[](https://arxiv.org/html/2604.07973v1)
EmbodiedNav-Bench is a goal-oriented embodied navigation benchmark for evaluating spatial action in urban 3D airspace. The benchmark contains 5,037 high-quality navigation trajectories with natural-language navigation goals, initial drone poses, target positions, and ground-truth 3D trajectories.
This Hugging Face repository hosts the dataset artifacts. The accompanying project code, simulator setup, media examples, and evaluation scripts are maintained in the GitHub repository: https://github.com/serenditipy-AC/Embodied-Navigation-Bench
## Dataset Summary
The benchmark contains 5,037 goal-oriented navigation trajectories. Each sample corresponds to one navigation task in an urban 3D environment, with a natural-language goal description and a human-collected ground-truth trajectory.
The dataset is intended for evaluating embodied navigation, spatial reasoning, and multimodal decision-making models in urban airspace scenarios.
## Repository Contents
| Path | Description |
| :-- | :-- |
| `navi_data.pkl` | Canonical PKL file for evaluation. |
| `viewer-00000-of-00001.parquet` | Parquet representation for the Hugging Face Dataset Viewer table. |
| `images/` | Trajectory-aligned image release, distributed as five ZIP archives plus a manifest file. |
## Data Fields
The canonical PKL file stores a list of Python dictionaries. Each sample contains the following fields:
| Field | Type | Description |
| :-- | :-- | :-- |
| `sample_index` | `int` | Sample index used for viewer browsing and image archive alignment. |
| `start_pos` | `float[3]` | Initial drone world position `(x, y, z)`. |
| `start_rot` | `float[3]` | Initial drone orientation `(roll, pitch, yaw)` in radians. |
| `start_ang` | `float` | Initial camera gimbal angle in degrees. |
| `task_desc` | `str` | Natural-language navigation instruction. |
| `target_pos` | `float[3]` | Target world position `(x, y, z)`. |
| `gt_traj` | `float[N,3]` | Ground-truth trajectory points. |
| `gt_traj_len` | `float` | Ground-truth trajectory length. |
The Parquet table includes the same structured fields and additional convenience columns such as `sample_index`, `start_x`, `start_y`, `start_z`, `target_x`, `target_y`, `target_z`, and `gt_traj_num_points`. The Parquet file is provided for browsing and visualization in the Hugging Face Dataset Viewer.
## Trajectory-Aligned Images
Trajectory-aligned image archives are available under [`images/`](https://huggingface.co/datasets/EmbodiedCity/EmbodiedNav-Bench/tree/main/images).
This release is about 56.7 GB and is distributed as five ZIP archives together with `merged_upload_images_zip_manifest.json`.
After extraction, folders `0-5036` correspond directly to the `sample_index` field in `navi_data.pkl` and the viewer table.
| Archive | Sample index range |
| :-- | :-- |
| `merged_upload_images_part01_0000-1007.zip` | `0-1007` |
| `merged_upload_images_part02_1008-2015.zip` | `1008-2015` |
| `merged_upload_images_part03_2016-3022.zip` | `2016-3022` |
| `merged_upload_images_part04_3023-4029.zip` | `3023-4029` |
| `merged_upload_images_part05_4030-5036.zip` | `4030-5036` |
## Usage
<!-- The Dataset Viewer-compatible table can be loaded with the `datasets` library:
```python
from datasets import load_dataset
ds = load_dataset("EmbodiedCity/EmbodiedNav-Bench", split="viewer")
print(ds[0])
```
-->
For evaluation, use `navi_data.pkl` as the canonical data file and follow the setup instructions in the GitHub project repository.
## License
This dataset is released under the CC-BY-4.0 license.
## Citation
```bibtex
@misc{zhao2026farlargemultimodalmodels,
title={How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace},
author={Baining Zhao and Ziyou Wang and Jianjie Fang and Zile Zhou and Yanggang Xu and Yatai Ji and Jiacheng Xu and Qian Zhang and Weichen Zhang and Chen Gao and Xinlei Chen},
year={2026},
eprint={2604.07973},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/html/2604.07973v1},
}
```
提供机构:
EmbodiedCity



