下载链接：

https://modelscope.cn/datasets/InternRobotics/VL-LN-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

# VL-LN Bench [![Code](https://img.shields.io/badge/GitHub-VL--LN--Bench-181717?logo=github)](https://github.com/InternRobotics/VL-LN) [![VL-LN Paper — arXiv](https://img.shields.io/badge/arXiv-VL--LN--Bench-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.22342) [![Project Page — VL-LN-Bench](https://img.shields.io/badge/Project_Page-VL--LN--Bench-4285F4?logo=google-chrome&logoColor=white)](https://0309hws.github.io/VL-LN.github.io/) [![Model](https://img.shields.io/badge/Model-VL--LN--Bench-FF6F00?logo=huggingface&logoColor=white)](https://huggingface.co/InternRobotics/VL-LN-Bench-basemodel) VL-LN Bench is the first large-scale benchmark for **Interactive Instance Goal Navigation (IIGN)**, where an embodied agent must locate a specific instance in a realistic 3D house while engaging in **free-form natural language dialog**. The dataset is built on Matterport3D scenes with MMScan meta-annotations and provides both **ambiguous category-only instructions** (for IIGN) and **full instance-level descriptions** (for IGN), enabling training and evaluation of agents that both navigate and ask questions. <img src="images/iion.png" alt="Overview of VL-LN Bench and the IIGN task" width="1000"> A case for the IIGN task. The oracle (top left) first gives a simple goal-oriented navigation instruction (“Search for the chair.”). The agent must find the specific instance of the given category (chair). During the process, the agent can ask questions to progressively resolve ambiguity and navigate to the correct target. ## 🔑 Key Features - **Large-scale, dialog-enabled dataset** VL-LN Bench contains 20,476 object instances (112 categories) and 3,785 start positions, forming over 330,000 episodes. We also provide an automatic data-generation pipeline, combining a frontier-based exploration agent with a scripted oracle, so users can easily scale up dialog-augmented trajectories with both navigation and dialog annotations. - **Two instance-level navigation benchmarks (IIGN & IGN)** VL-LN Bench defines two complementary evaluation tracks for instance goal navigation: IIGN, an interactive track with partially specified, category-only goals, and IGN, a non-interactive track with fully specified, unambiguous descriptions. Both tracks share the same scenes and targets, enabling controlled comparison between policies with and without interaction. ## 🧾 TODO List - [x] Release train/val splits of VL-LN Bench - [x] Release evaluation code - [x] Release training code - [x] Release data generation pipeline ## 📄 Table of Contents - [VL-LN Bench](#vl-ln-bench) - [🔑 Key Features](#key-features) - [🧾 TODO List](#todo-list) - [🚀 Quick Start](#quick-start) - [📁 Dataset Structure](#dataset-structure) - [Branch Structure](#branch-structure) - [Core Dataset Structure](#core-dataset-structure) - [Dataset Summary Table](#dataset-summary-table) - [📜 License and Citation](#license-and-citation) ## Quick Start We provide three main components in this repo: the **validation set** (`raw_data/mp3d/val_unseen/`), the **training set** (`raw_data/mp3d/train/`), and the **collected dialog-augmented trajectories** (`traj_data/`). The validation and training sets are stored as `*.json.gz` files and can be used directly with the Habitat simulator. The collected trajectories are designed for policy training and consist of **RGB-D images** and **annotations**. The images are captured in Habitat, and for each trajectory we provide two camera views: a **front-facing (0°)** view and a **tilted (30° down)** view. The annotation files contain the remaining trajectory information, including the scene ID, instruction, action sequence, pixel-level goal sequence, dialog, and camera pose. The statistics of the collected training trajectories are shown below: <img src="images/statics.png" alt="Statistics of the collected training trajectories" width="1000"> ### Download the Full Dataset To download the complete VL-LN Bench dataset: ```bash # Make sure you have git-lfs installed (https://git-lfs.com) git lfs install # Clone the full dataset repository git clone https://huggingface.co/datasets/InternRobotics/VL-LN-Bench ``` ### Download Specific Components To save bandwidth and storage, you can download only the components you need: ### Individual Files (via huggingface-hub) Use [huggingface-hub](https://huggingface.co/docs/huggingface_hub/guides/download) to download individual files (requires acceptance of the gated license first): ```bash # Download only README.md for example from huggingface_hub import hf_hub_download # Download the file and retrieve its path file_path = hf_hub_download( repo_id="InternRobotics/VL-LN-Bench", filename="raw_data/mp3d/val_unseen/val_unseen_iign.json.gz", revision="main", # From specific version repo_type="dataset" # Explicitly specify it's a dataset repo ) print("Local file path:", file_path) # Print the path directly ``` ### Selective Components Only Trajectory Data for a Specific Split: ```bash # Clone with LFS pointers only, then pull specific data GIT_LFS_SKIP_SMUDGE=1 git clone -b main https://huggingface.co/datasets/InternRobotics/VL-LN-Bench cd VL-LN-Bench # Pull only Split 1 trajectory data git lfs pull --include="traj_data/mp3d_split1/**,traj_data_30deg/mp3d_split1/**" ``` ## 📁 Dataset Structure ### Branch Structure ``` Branches: ├── main # Latest dataset release ``` ### Core Dataset Structure This repository contains the VL-LN Bench dataset, which is organized into three main components: `raw_data` and `traj_data`. ``` VL-LN-Bench/ ├── raw_data/ │ └── <scene_datasets>/ │ ├── scene_summary/ │ ├── train/ │ │ ├── train_ign.json.gz │ │ └── train_iign.json.gz │ └── val_unseen/ │ ├── val_unseen_ign.json.gz │ └── val_unseen_iign.json.gz └── traj_data/ └── <scene_datasets> / ... ├── <scene> / ...├── data/ │ ├── chunk-000 | | ├── episode_000000.parquet # include goals and poses info | | ├── episode_000001.parquet | | └── ... │ ├── chunk-001 | | ├── episode_001000.parquet | | ├── episode_001001.parquet | | └── ... │ └── chunk-... ├── meta/ │ ├── episodes_stats.jsonl │ ├── episodes.jsonl │ ├── info.json │ └── tasks.jsonl └── videos ├── chunk-000 | ├── episode_000000 | | ├── observation.images.rgb.125cm_0deg | | | ├── episode_000000_0.jpg # 000000 is trajectory id; 0 is image id in this trajectory | | │ ├── episode_000000_1.jpg | | │ └── ... | | ├── observation.images.depth.125cm_0deg | | | ├── episode_000000_0.png | | | ├── episode_000000_1.png | | | └── ... | | ├── observation.images.rgb.125cm_30deg | | └── observation.images.depth.125cm_30deg | └── episode_... └── chunk-... ``` > **Note:** Due to the dataset’s large size, all data is packaged into `<scene>.tar.gz` files to simplify downloading. To use the data, simply extract all compressed files inside each `<scene_datasets>` directory into the same `<scene_datasets>` folder, ensuring that the resulting directory structure matches the layout shown above. ### Dataset Summary Table | Split | Episodes | Key Features | Data Location | | ------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------- | | val_unseen | 500 | Validation episodes in VL-LN Bench (IIGN & IGN) | `raw_data/mp3d/val_unseen/` | | train | 20,476 instances 3,785 start positions 240,000+ episodes | Start–instance pairs that are guaranteed to be connected/reachable | `raw_data/mp3d/train/` | | train (with trajectories) | 40,000+ | Subset of `train` episodes with generated RGB trajectories and annotations | `traj_data/` | ### 📜 License and Citation All the data and code within this repo are under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Please consider citing our project if it helps your research. ```bibtex @misc{huang2025vllnbenchlonghorizongoaloriented, title={VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs}, author={Wensi Huang and Shaohao Zhu and Meng Wei and Jinming Xu and Xihui Liu and Hanqing Wang and Tai Wang and Feng Zhao and Jiangmiao Pang}, year={2025}, eprint={2512.22342}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2512.22342}, } ``` > **Note**: To access this dataset, you must agree to the InternData-N1 COMMUNITY LICENSE AGREEMENT and provide the required contact information as specified in the gated access form. The information you provide will be collected, stored, processed and shared in accordance with the InternData Privacy Policy.

# VL-LN 基准测试（VL-LN Bench） [![代码](https://img.shields.io/badge/GitHub-VL--LN--Bench-181717?logo=github)](https://github.com/InternRobotics/VL-LN) [![VL-LN 论文 — arXiv](https://img.shields.io/badge/arXiv-VL--LN--Bench-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.22342) [![项目主页 — VL-LN-Bench](https://img.shields.io/badge/Project_Page-VL--LN--Bench-4285F4?logo=google-chrome&logoColor=white)](https://0309hws.github.io/VL-LN.github.io/) [![模型](https://img.shields.io/badge/Model-VL--LN--Bench-FF6F00?logo=huggingface&logoColor=white)](https://huggingface.co/InternRobotics/VL-LN-Bench-basemodel) VL-LN Bench 是首个面向**交互式实例目标导航（Interactive Instance Goal Navigation，IIGN）**的大规模基准测试集，该任务要求具身智能体在逼真的3D住宅场景中定位特定实例对象，同时需进行**自由形式自然语言对话**。本数据集基于Matterport3D场景构建，附带MMScan元标注，同时提供**仅含模糊类别信息的指令**（适配IIGN任务）与**完整实例级描述**（适配实例目标导航（Instance Goal Navigation，IGN）任务），可用于训练和评估兼具导航与问答能力的智能体。 <img src="images/iion.png" alt="VL-LN Bench 与 IIGN 任务概览" width="1000"> 一个 IIGN 任务示例。旁白（左上角）首先给出一条简单的面向目标的导航指令：“寻找椅子。”智能体需找到给定类别（椅子）的特定实例。在此过程中，智能体可通过提问逐步消除歧义，最终导航至正确目标。 ## 🔑 核心特性 - **大规模支持对话的数据集** VL-LN Bench 包含20476个物体实例（涵盖112个类别）与3785个起始位置，共生成超过33万个交互回合。我们还提供了自动化数据生成流水线，结合基于前沿探索的智能体与脚本化旁白，用户可便捷扩展兼具导航与对话标注的对话增强轨迹。 - **两类实例级导航基准测试（IIGN与IGN）** VL-LN Bench 定义了两类互补的实例目标导航评估赛道：IIGN为交互式赛道，目标仅部分指定、仅包含类别信息；IGN为非交互式赛道，目标描述完整且无歧义。两类赛道共享相同的场景与目标对象，可实现有无交互的导航策略间的可控对比。 ## 🧾 待办事项 - [x] 发布 VL-LN Bench 的训练/验证划分 - [x] 发布评估代码 - [x] 发布训练代码 - [x] 发布数据生成流水线 ## 📄 目录 - [VL-LN Bench](#vl-ln-bench) - [🔑 核心特性](#key-features) - [🧾 待办事项](#todo-list) - [🚀 快速上手](#quick-start) - [📁 数据集结构](#dataset-structure) - [分支结构](#branch-structure) - [核心数据集结构](#core-dataset-structure) - [数据集汇总表](#dataset-summary-table) - [📜 许可与引用](#license-and-citation) ## 🚀 快速上手我们提供三大核心组件：**验证集**（`raw_data/mp3d/val_unseen/`）、**训练集**（`raw_data/mp3d/train/`）以及**已收集的对话增强轨迹**（`traj_data/`）。验证集与训练集以`*.json.gz`格式存储，可直接配合Habitat模拟器（Habitat simulator）使用。已收集的轨迹专为策略训练设计，包含**RGB-D图像**与**标注信息**。图像在Habitat中采集，每条轨迹提供两种相机视角：**前向（0°）视角**与**向下倾斜30°视角**。标注文件包含轨迹的其余信息，包括场景ID、指令、动作序列、像素级目标序列、对话内容与相机位姿。已收集的训练轨迹的统计信息如下： <img src="images/statics.png" alt="已收集训练轨迹的统计信息" width="1000"> ### 下载完整数据集如需下载完整的VL-LN Bench数据集： bash # 确保已安装 git-lfs (https://git-lfs.com) git lfs install # 克隆完整数据集仓库 git clone https://huggingface.co/datasets/InternRobotics/VL-LN-Bench ### 下载指定组件为节省带宽与存储空间，您可仅下载所需组件： ### 单个文件（通过 huggingface-hub）使用 [huggingface-hub](https://huggingface.co/docs/huggingface_hub/guides/download) 下载单个文件（需先同意受限访问许可）： python # 例如仅下载 README.md from huggingface_hub import hf_hub_download # 下载文件并获取其本地路径 file_path = hf_hub_download( repo_id="InternRobotics/VL-LN-Bench", filename="raw_data/mp3d/val_unseen/val_unseen_iign.json.gz", revision="main", # 指定版本 repo_type="dataset" # 明确指定为数据集仓库 ) print("本地文件路径:", file_path) # 直接打印路径 ### 选择性组件仅下载指定划分的轨迹数据： bash # 仅克隆带LFS指针的仓库，随后拉取指定数据 GIT_LFS_SKIP_SMUDGE=1 git clone -b main https://huggingface.co/datasets/InternRobotics/VL-LN-Bench cd VL-LN-Bench # 仅拉取划分1的轨迹数据 git lfs pull --include="traj_data/mp3d_split1/**,traj_data_30deg/mp3d_split1/**" ## 📁 数据集结构 ### 分支结构 Branches: ├── main # 最新数据集发布版本 ### 核心数据集结构本仓库包含VL-LN Bench数据集，分为三大核心组件：`raw_data`与`traj_data`。 VL-LN-Bench/ ├── raw_data/ │ └── <scene_datasets>/ │ ├── scene_summary/ │ ├── train/ │ │ ├── train_ign.json.gz │ │ └── train_iign.json.gz │ └── val_unseen/ │ ├── val_unseen_ign.json.gz │ └── val_unseen_iign.json.gz └── traj_data/ └── <scene_datasets> / ... ├── <scene> / ...├── data/ │ ├── chunk-000 | | ├── episode_000000.parquet # 包含目标与位姿信息 | | ├── episode_000001.parquet | | └── ... │ ├── chunk-001 | | ├── episode_001000.parquet | | ├── episode_001001.parquet | | └── ... │ └── chunk-... ├── meta/ │ ├── episodes_stats.jsonl │ ├── episodes.jsonl │ ├── info.json │ └── tasks.jsonl └── videos ├── chunk-000 | ├── episode_000000 | | ├── observation.images.rgb.125cm_0deg | | | ├── episode_000000_0.jpg # 000000为轨迹ID；0为该轨迹内的图像ID | | │ ├── episode_000000_1.jpg | | │ └── ... | | ├── observation.images.depth.125cm_0deg | | | ├── episode_000000_0.png | | | ├── episode_000000_1.png | | | └── ... | | ├── observation.images.rgb.125cm_30deg | | └── observation.images.depth.125cm_30deg | └── episode_... └── chunk-... > **注意**：由于数据集体积较大，所有数据均打包为`<scene>.tar.gz`文件以简化下载流程。如需使用数据，只需将每个`<scene_datasets>`目录下的所有压缩包解压至该`<scene_datasets>`目录本身，确保最终目录结构与上述布局一致。 ### 数据集汇总表 | 划分 | 交互回合数 | 核心特性 | 数据存储位置 | | ------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------- | | val_unseen | 500 | VL-LN Bench 中的验证交互回合（涵盖IIGN与IGN任务） | `raw_data/mp3d/val_unseen/` | | train | 20476个物体实例 3785个起始位置 24万+交互回合 | 已确保可到达的起始位置-目标实例配对 | `raw_data/mp3d/train/` | | 带轨迹的训练集 | 4万+ | `train`划分的子集，包含生成的RGB轨迹与标注 | `traj_data/` | ### 📜 许可与引用本仓库内的所有数据与代码均遵循 [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) 许可协议。若本项目对您的研究有所帮助，请考虑引用我们的工作。 bibtex @misc{huang2025vllnbenchlonghorizongoaloriented, title={VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs}, author={Wensi Huang and Shaohao Zhu and Meng Wei and Jinming Xu and Xihui Liu and Hanqing Wang and Tai Wang and Feng Zhao and Jiangmiao Pang}, year={2025}, eprint={2512.22342}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2512.22342}, } > **注意**：如需使用本数据集，您必须同意《InternData-N1 社区许可协议》，并按受限访问表单要求提供联系信息。您提供的信息将按照《InternData 隐私政策》进行收集、存储、处理与共享。

应用场景：