Arena-G1-Loco-Manipulation-Task
收藏魔搭社区2025-12-04 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/nv-community/Arena-G1-Loco-Manipulation-Task
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Description:
The Arena-G1-Loco-Manipulation-Task dataset is multimodal collections of trajectories generated in Isaac Lab. It supports humanoid (G1) loco-manipulation task in IsaacLab-Arena environment. Each entry provides the full context (state, vision, language, action) needed to train and evaluate generalist robot policies for box pick and place task.
| Dataset Name | # Trajectories |
|---------------------------|----------------|
| G1 Loco-Manipulation Task | 50 |
This dataset is ideal for behavior cloning, policy learning, and generalist robotic (loco) manipulation research. It has been for post-training GR00T N1.5 model.
This dataset is ready for commercial use.
## Dataset Owner
NVIDIA Corporation
## Dataset Creation Date:
10/10/2025
## License/Terms of Use:
This dataset is governed by the Creative Commons Attribution 4.0 International License (CC-BY-4.0).
## Intended Usage:
This dataset is intended for:
- Training robot manipulation policies using behavior cloning.
- Research in generalist robotics and task-conditioned agents.
- Sim-to-real / Sim-to-Sim transfer studies.
## Dataset Characterization:
### Data Collection Method
- Automated
- Automatic/Sensors
- Synthetic
5 human teleoperated demonstrations are collected through a depth camera and keyboard in Isaac Lab. All 50 demos are generated automatically using a synthetic motion trajectory generation framework, Mimicgen [1]. Each demo is generated at 50 Hz.
### Labeling Method
Not Applicable
## Dataset Format:
We provide a few dataset files, including
- a human-annoated 5 demonstrations in HDF5 dataset file (`arena_g1_loco_manipulation_dataset_annotated.hdf5`)
- a Mimic-generated 1 demostration in HDF5 dataset file (`arena_g1_loco_manipulation_dataset_generated_small.hdf5`)
- a Mimic-generated 50 demonstrations in HDF5 dataset file (`arena_g1_loco_manipulation_dataset_generated.hdf5`)
- a GR00T-Lerobot formatted dataset converted from the Mimic-generated HDF5 dataset file (`lerobot`)
-
Each demo in GR00T-Lerobot datasets consists of a time-indexed sequence of the following modalities:
### Actions
- action (FP64): joint desired positions for all body joints (26 DoF)
### Observations
- observation.state (FP64): joint positions for all body joints (26 DoF)
### Task-specific
- timestamp (FP64): simulation time in seconds of each recorded data entry.
- annotation.human.action.task_description (INT64): index referring to the language instruction recorded in the metadata
- annotation.human.action.valid (INT64): index indicating validity of annotaion recorded in the metadata
- episode_index (INT64): index indicating the order of each demo
- task_index (INT64): index used in multi-task data loader. Not applicable to Gr00t-N1 post training, always set to 0.
### Videos
- 256 x 256 RGB videos in mp4 format from first-person-view camera
In additional, a set of metadata describing the followings is provided,
- `episodes.jsonl` contains a list of all the episodes in the entire dataset. Each episode contains a list of tasks and the length of the episode.
- `tasks.jsonl` contains a list of all the tasks in the entire dataset.
- `modality.json` contains the modality configuration.
- `info.json` contains the dataset information.
## Dataset Quantification:
### Record Count
#### G1 Loco-Manipulation Task
- Number of demonstrations/trajectories: 50
- Number of RGB videos: 50
### Total Storage
24.1 GB
## Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
## Reference(s):
[1] @inproceedings{mandlekar2023mimicgen,
title={MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations},
author={Mandlekar, Ajay and Nasiriany, Soroush and Wen, Bowen and Akinola, Iretiayo and Narang, Yashraj and Fan, Linxi and Zhu, Yuke and Fox, Dieter},
booktitle={7th Annual Conference on Robot Learning},
year={2023}
}
### 数据集描述
Arena-G1-Loco-Manipulation-Task 数据集是在Isaac Lab中生成的多模态轨迹集合,支持IsaacLab-Arena环境下的类人(G1)移动操作任务。每一条数据均提供了训练与评估通用机器人策略所需的完整上下文信息(状态、视觉、语言、动作),适用于箱状物拾取与放置任务。
| 数据集名称 | 轨迹条数 |
|---------------------------|----------------|
| G1 移动操作任务 | 50 |
本数据集适用于行为克隆、策略学习以及通用机器人(移动)操作相关研究,已被用于GR00T N1.5模型的后训练阶段。本数据集可商用。
## 数据集所有者
NVIDIA公司
## 数据集创建日期:
2025年10月10日
## 使用许可:
本数据集遵循知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International License,CC-BY-4.0)。
## 适用场景:
- 基于行为克隆的机器人操作策略训练
- 通用机器人学与任务条件智能体相关研究
- 仿真到真实(Sim-to-Real)、仿真到仿真(Sim-to-Sim)迁移研究
## 数据集特征:
### 数据采集方式
- 自动化采集
- 自动/传感器采集
- 合成数据
本数据集在Isaac Lab环境中,通过深度相机与键盘采集了5条人类远程操作演示数据;全部50条演示数据均通过合成运动轨迹生成框架Mimicgen[1]自动生成。每条演示数据的采样频率为50Hz。
### 标注方式:无适用标注流程
## 数据集格式
本数据集提供以下多种文件格式:
- 包含5条人类标注演示数据的HDF5格式数据集文件(`arena_g1_loco_manipulation_dataset_annotated.hdf5`)
- 包含1条Mimic生成演示数据的HDF5格式数据集文件(`arena_g1_loco_manipulation_dataset_generated_small.hdf5`)
- 包含50条Mimic生成演示数据的HDF5格式数据集文件(`arena_g1_loco_manipulation_dataset_generated.hdf5`)
- 基于Mimic生成的HDF5数据集转换而来的GR00T-Lerobot格式数据集(`lerobot`)
GR00T-Lerobot格式数据集中的每条演示数据均为带时间索引的多模态序列,包含以下内容:
### 动作(Actions)
- `action`(FP64类型):所有机体关节的期望位置(共26个自由度,DoF)
### 观测数据(Observations)
- `observation.state`(FP64类型):所有机体关节的实时位置(共26个自由度,DoF)
### 任务专属字段
- `timestamp`(FP64类型):每条记录数据对应的仿真时间(单位:秒)
- `annotation.human.action.task_description`(INT64类型):指向元数据中语言指令的索引
- `annotation.human.action.valid`(INT64类型):指向元数据中标注有效性的索引
- `episode_index`(INT64类型):每条演示数据的序列索引
- `task_index`(INT64类型):多任务数据加载器使用的任务索引,本数据集不适用于GR00T-N1后训练场景,固定设为0
### 视频数据
- 第一视角相机采集的256×256分辨率MP4格式RGB视频
此外,本数据集还提供以下元数据文件:
- `episodes.jsonl`:数据集全部序列的列表文件,每条序列包含对应任务列表与序列时长
- `tasks.jsonl`:数据集全部任务的列表文件
- `modality.json`:多模态配置文件
- `info.json`:数据集信息文件
## 数据集量化统计
### 数据条数统计
#### G1移动操作任务
- 演示数据/轨迹条数:50
- RGB视频条数:50
### 总存储容量:24.1 GB
## 伦理考量
NVIDIA认为,可信人工智能(Trustworthy AI)是一项共同责任,我们已建立相关政策与实践规范,以支持各类人工智能应用的开发。开发者在按照本数据集使用条款下载或使用本数据集时,应与内部模型团队协作,确保所开发的模型符合相关行业与应用场景的要求,并规避可能出现的产品误用问题。
请通过[此链接](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)提交安全漏洞或NVIDIA人工智能相关问题反馈。
## 参考文献
[1] @inproceedings{mandlekar2023mimicgen,
title={MimicGen:基于人类演示的可扩展机器人学习数据生成系统},
author={Mandlekar, Ajay and Nasiriany, Soroush and Wen, Bowen and Akinola, Iretiayo and Narang, Yashraj and Fan, Linxi and Zhu, Yuke and Fox, Dieter},
booktitle={第七届机器人学习年度会议},
year={2023}
}
提供机构:
maas
创建时间:
2025-12-02



