Scene-N1
收藏魔搭社区2025-12-05 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/InternRobotics/Scene-N1
下载链接
链接失效反馈官方服务:
资源简介:
<!-- <div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/64e6d9d229a548f66aff6e5b/y9cRIMFKBcY1qsxJ58y7Y.jpeg"/>
</div> -->
# SceneData-N1
<!--
## 🔑 Key Features
- **Unified Format for Different Benchmarks**
InternData-N1 consolidates three subsets—VLN-CE, VLN-PE, and VLN-N1—into the mainstream LeRobot (v2.1) format, facilitating convenient usage across different systems and diverse benchmarks.
- **Diverse Data Covering Different Embodiments, Tasks, and Scenes**
InternData-N1 offers diversity through its foundation of 3,000+ scene assets, extensive randomization across different robot embodiments and viewpoints, and rephrased instructions generated by LLMs for common navigation tasks.
- **High Quality Through Effective Generation and Filtering**
InternData-N1 ensures high quality by employing effective data generation strategies (producing smooth and safe trajectories) and rigorous filtering (excluding samples with very few reference objects). This results in state-of-the-art performance for models trained on it, such as InternVLA-N1.
## 📅 TODO List
- [x] **InternData-N1 subsets**: 2.8k+ VLN-PE, 150k+ VLN-CE, 6k+ VLN-N1 episodes
- [ ] **Release 200k+ VLN-N1** (in 2 weeks)
- [ ] **VLN-CE v1 -> v1.3** (in one month)
## 📋 Table of Contents
- [InternData-N1](#interndata-n1)
- [🔑 Key Features](#-key-features)
- [📅 TODO List](#-todo-list)
- [📋 Table of Contents](#-table-of-contents)
- [🔥 Get Started](#-get-started)
- [Download the Dataset](#download-the-dataset)
- [Dataset Structure](#dataset-structure)
- [Scene Data Assets](#scene-data-assets)
- [Core Dataset Structure](#core-dataset-structure)
## 🔥 Get Started
### Download the Dataset
To download the full dataset, you can use the following commands. If you encounter any issues, please refer to the official Hugging Face documentation.
```bash
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
# When prompted for a password, use an access token with read permissions.
# Generate one from your settings: https://huggingface.co/settings/tokens
git clone https://huggingface.co/datasets/InternRobotics/InternData-N1
# If you want to clone without large files - just their pointers
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/InternRobotics/InternData-N1
``` -->
## Dataset Structure
```
scene_data/
├── gradio_scene_assets/
├── n1_eval_scenes/
│ ├── Materials
│ ├── SkyTexture
│ ├── internscenes_home
│ └── internscenes_commercial
├── mp3d_pe/
├── mp3d_n1/
└── mp3d_ce/
```
- `scene_data/mp3d_pe/`: Improved Matterport3D scene assets for VLN-PE benchmark.
- `scene_data/mp3d_n1/`: Base Matterport3D scans used for generating VLN-N1 trajectory data.
- `scene_data/mp3d_ce/`: Matterport3D scene assets for VLN-CE benchmark.
- `scene_data/n1_eval_scenes/`: Scene assets for VLN-N1 benchmark.
- `scene_data/gradio_scene_assets/`: Selection of MP3D_CE scenes (with ceilings removed) for rapid interactive testing of the [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1) model. These simplified environments allow users to quickly verify model performance in key scenarios.
> **Note**: The original scene datasets can be obtained from [Matterport3D](https://niessner.github.io/Matterport/).
# License and Citation
All the data and code within this repo are under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Please consider citing our project if it helps your research.
```BibTeX
@misc{scene_n1,
title={Scene-N1 Dataset},
author={Scene-N1 Dataset contributors},
howpublished={\url{https://huggingface.co/datasets/InternRobotics/Scene-N1}},
year={2025}
}
```
<!-- <div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/64e6d9d229a548f66aff6e5b/y9cRIMFKBcY1qsxJ58y7Y.jpeg"/>
</div> -->
# SceneData-N1
## 🔑 核心特性
- **适配多基准测试的统一格式**
InternData-N1 将 VLN-CE、VLN-PE 与 VLN-N1 三个子集整合至主流的 LeRobot(v2.1)格式中,便于在不同系统与各类基准测试中便捷使用。
- **覆盖多形态、多任务与多场景的多样化数据**
InternData-N1 以3000余个场景资产为基础,针对不同机器人形态与视角进行了广泛的随机化处理,并通过大语言模型(LLM)为常见导航任务生成了重述后的指令,从而实现了数据的多样性。
- **基于高效生成与筛选机制的高质量数据**
InternData-N1 通过采用高效的数据生成策略(可生成平滑且安全的轨迹)与严格的筛选流程(剔除参考物体极少的样本)来保障数据质量,最终使基于该数据集训练的模型(如 InternVLA-N1)能够达到顶尖性能。
## 📅 待办清单
- [x] **InternData-N1 子集**:2.8万+ 条 VLN-PE、15万+ 条 VLN-CE 与 6千+ 条 VLN-N1 轨迹片段
- [ ] **发布20万+ 条 VLN-N1 数据**(预计2周内)
- [ ] **将 VLN-CE 从 v1 升级至 v1.3**(预计1个月内)
## 📋 目录
- [SceneData-N1](#scenedata-n1)
- [🔑 核心特性](#-key-features)
- [📅 待办清单](#-todo-list)
- [📋 目录](#-table-of-contents)
- [🔥 快速上手](#-get-started)
- [下载数据集](#download-the-dataset)
- [数据集结构](#dataset-structure)
- [场景数据资产](#scene-data-assets)
- [核心数据集结构](#core-dataset-structure)
## 🔥 快速上手
### 下载数据集
若需下载完整数据集,可执行以下命令。若遇到任何问题,请参考 Hugging Face 官方文档。
bash
# 请确保已安装 git-lfs(https://git-lfs.com)
git lfs install
# 若提示输入密码,请使用具备读取权限的访问令牌。
# 可从以下地址生成令牌:https://huggingface.co/settings/tokens
git clone https://huggingface.co/datasets/InternRobotics/InternData-N1
# 若仅需克隆文件指针而非完整大文件
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/InternRobotics/InternData-N1
## 数据集结构
scene_data/
├── gradio_scene_assets/
├── n1_eval_scenes/
│ ├── Materials
│ ├── SkyTexture
│ ├── internscenes_home
│ └── internscenes_commercial
├── mp3d_pe/
├── mp3d_n1/
└── mp3d_ce/
- `scene_data/mp3d_pe/`:用于 VLN-PE 基准测试的优化版 Matterport3D 场景资产。
- `scene_data/mp3d_n1/`:用于生成 VLN-N1 轨迹数据的基础 Matterport3D 扫描场景。
- `scene_data/mp3d_ce/`:用于 VLN-CE 基准测试的 Matterport3D 场景资产。
- `scene_data/n1_eval_scenes/`:用于 VLN-N1 基准测试的场景资产。
- `scene_data/gradio_scene_assets/`:精选的 MP3D_CE 场景(已移除天花板),用于 [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1) 模型的快速交互式测试。这些简化后的环境可帮助用户快速验证模型在关键场景下的性能。
> **注意**:原始场景数据集可从 [Matterport3D](https://niessner.github.io/Matterport/) 获取。
# 授权与引用
本仓库内的所有数据与代码均遵循 [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) 开源协议。若本项目对你的研究有所帮助,请考虑引用我们的工作。
BibTeX
@misc{scene_n1,
title={Scene-N1 Dataset},
author={Scene-N1 Dataset contributors},
howpublished={url{https://huggingface.co/datasets/InternRobotics/Scene-N1}},
year={2025}
}
提供机构:
maas
创建时间:
2025-07-28



