GenieSim3.0-Dataset

Name: GenieSim3.0-Dataset
Creator: maas
Published: 2026-05-17 01:50:24
License: 暂无描述

魔搭社区2026-05-17 更新2026-01-10 收录

下载链接：

https://modelscope.cn/datasets/agibot_world/GenieSim3.0-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

# GenieSim 3.0 Dataset <div style="display: flex; justify-content: center; align-items: center; margin: 20px 0;"> <video controls autoplay loop muted width="100%" style="max-width: 100%; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);"> <source src="dataset.mp4" type="video/mp4"> Your browser does not support the video tag. </video> </div> <div align="center" style="margin: 24px 0;"> <a href="https://arxiv.org/abs/2601.02078" style="text-decoration:none;"> <img src="https://img.shields.io/badge/arXiv-2601.02078-red.svg?logo=arxiv&logoColor=white" alt="arXiv Paper: 2601.02078" style="vertical-align: middle; margin-right: 24px; height: 25px;"> </a> <a href="https://agibot-world.com/genie-sim" style="text-decoration:none;"> <img src="https://img.shields.io/badge/Project%20Page-genie--sim-1976d2?logo=githubpages&logoColor=white" alt="Project Homepage" style="vertical-align: middle; margin-right: 24px; height: 25px;"> </a> <a href="https://github.com/AgibotTech/genie_sim" style="text-decoration:none;"> <img src="https://img.shields.io/badge/Code-GitHub-181717?logo=github&logoColor=white" alt="Code Repository" style="vertical-align: middle; margin-right: 24px; height: 25px;"> </a> <a href="https://modelscope.cn/datasets/agibot_world/GenieSim3.0-Dataset" style="text-decoration:none;"> <img src="https://img.shields.io/badge/Dataset-ModelScope-FF6B35?logo=model&logoColor=white" alt="ModelScope Dataset" style="vertical-align: middle; height: 25px;"> </a> </div> <strong>GenieSim 3.0 Dataset</strong> is a large-scale robotic manipulation simulation dataset built on Isaac Sim, covering diverse manipulation task scenarios and providing high-quality training data for embodied intelligence research. ## 📖 Dataset Introduction **GenieSim 3.0 Dataset** represents the largest open-source simulation dataset in embodied AI. GenieSim open-sources over **10,000 hours** of simulation data across **200+ tasks**, featuring multi-sensor streams including RGB-D, stereo vision, and whole-body kinematics, alongside multi-dimensional variations in layout, noise, and lighting. The complexity of tasks is hierarchically organized, facilitating a structured progression from simple to complex scenarios. A key design principle is **composability**: long-horizon tasks can be decomposed into sequences of fundamental sub-tasks contained within the dataset. To this end, we structure the task taxonomy along three primary axes: **manipulation skill**, **cognitive comprehension**, and **task complexity**. Addressing the critical concern of sim-to-real gap in the field, we conducted comprehensive sim-to-real validation. Training with synthetic data exhibits **zero-shot sim-to-real transfer** with superior task success rates compared to real data, demonstrating the practical utility of our simulation dataset for real-world robotic applications. ## 🔑 Key Features - **Large-scale simulation data**: Over 10,000 hours of high-quality simulation demonstrations across 200+ manipulation tasks - **Multi-sensor streams**: RGB-D cameras, stereo vision, and whole-body kinematics data - **Rich variations**: Multi-dimensional variations in scene layout, sensor noise, and lighting conditions - **Hierarchical task organization**: Tasks organized from simple to complex, enabling progressive learning - **Composable task structure**: Long-horizon tasks decomposable into fundamental sub-tasks - **Three-dimensional task taxonomy**: Structured along manipulation skill, cognitive comprehension, and task complexity axes - **Proven sim-to-real transfer**: Zero-shot transfer capability with superior performance compared to real data training ## 📁 Dataset Structure ## Folder Hierarchy The dataset repository contains three main directories at the top level: ``` GenieSim3.0-Dataset/ ├── dataset/ # Main dataset directory │ ├── heat_food_microwave/ # Task name │ │ ├── g2/ # Robot type │ │ │ ├── data/ # Episode data files │ │ │ │ ├── chunk-000/ # Data chunk 0 │ │ │ │ │ ├── episode_000000.parquet │ │ │ │ │ ├── episode_000001.parquet │ │ │ │ │ ├── episode_000002.parquet │ │ │ │ │ └── ... │ │ │ │ ├── chunk-001/ # Data chunk 1 │ │ │ │ │ ├── episode_001000.parquet │ │ │ │ │ ├── episode_001001.parquet │ │ │ │ │ └── ... │ │ │ │ └── ... │ │ │ ├── meta/ # Metadata files │ │ │ │ ├── info.json # Dataset information and feature definitions │ │ │ │ ├── episodes.jsonl # Episode metadata (task, length, etc.) │ │ │ │ ├── episodes_stats.jsonl # Per-episode statistics │ │ │ │ └── tasks.jsonl # Task definitions │ │ │ └── videos/ # Video recordings │ │ │ ├── chunk-000/ # Video chunk 0 │ │ │ │ ├── observation.images.top_head/ # Top head camera videos │ │ │ │ │ ├── episode_000000.mp4 │ │ │ │ │ ├── episode_000001.mp4 │ │ │ │ │ └── ... │ │ │ │ ├── observation.images.hand_left/ # Left hand camera videos │ │ │ │ │ └── ... │ │ │ │ ├── observation.images.hand_right/ # Right hand camera videos │ │ │ │ │ └── ... │ │ │ │ ├── observation.images.head_depth/ # Head depth camera videos │ │ │ │ │ └── ... │ │ │ │ ├── observation.images.hand_left_depth/ # Left hand depth videos │ │ │ │ │ └── ... │ │ │ │ └── observation.images.hand_right_depth/ # Right hand depth videos │ │ │ │ └── ... │ │ │ ├── chunk-001/ # Video chunk 1 │ │ │ │ ├── observation.images.top_head/ │ │ │ │ │ └── ... │ │ │ │ └── ... │ │ │ └── ... │ │ └── ... # Other robot types │ ├── organize_kitchen_utensils/ # Another task name │ │ ├── g2/ # Robot type │ │ │ ├── data/ │ │ │ ├── meta/ │ │ │ └── videos/ │ │ └── ... │ └── ... # Other tasks ├── checkpoints/ # Trained model checkpoints │ └── ... # PI 0.5 models trained on simulation data └── reconstruction_source_data/ # 3DGS reconstruction source data └── ... # Original data for 3D Gaussian Splatting reconstruction ``` The dataset follows the [LeRobot](https://github.com/huggingface/lerobot) v0.3.3 format, which provides a standardized structure for robotic manipulation datasets. Within the `dataset/` directory, data is organized by task name and robot type, with each task-robot combination containing episode data in Parquet format, metadata files (info.json, episodes.jsonl, episodes_stats.jsonl, tasks.jsonl), and video recordings organized by chunk and camera view. The repository also includes: - **checkpoints/**: Pre-trained PI 0.5 models trained on the simulation data. These models are the ones referenced in the paper that demonstrate **zero-shot sim-to-real transfer with superior task success rates compared to real data**. The large-scale, systematically randomized synthetic data pipeline implemented in GenieSim 3.0 offers a pragmatic and scalable alternative to real-robot data collection, enabling the training of robust models capable of high zero-shot performance in physical environments. - **reconstruction_source_data/**: Original data used for 3D Gaussian Splatting (3DGS) reconstruction ## 📄 License and Citation All the data and code within this repo are under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Please consider citing our project if it helps your research: ```BibTeX @misc{yin2026geniesim30, title={Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot}, author={Chenghao Yin and Da Huang and Di Yang and Jichao Wang and Nanshu Zhao and Chen Xu and Wenjun Sun and Linjie Hou and Zhijun Li and Junhui Wu and Zhaobo Liu and Zhen Xiao and Sheng Zhang and Lei Bao and Rui Feng and Zhenquan Pang and Jiayu Li and Qian Wang and Maoqing Yao}, year={2026}, eprint={2601.02078}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2601.02078}, } ```

# GenieSim 3.0 数据集 <div style="display: flex; justify-content: center; align-items: center; margin: 20px 0;"> <video controls autoplay loop muted width="100%" style="max-width: 100%; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);"> <source src="dataset.mp4" type="video/mp4"> Your browser does not support the video tag. </video> </div> <div align="center" style="margin: 24px 0;"> <a href="https://arxiv.org/abs/2601.02078" style="text-decoration:none;"> <img src="https://img.shields.io/badge/arXiv-2601.02078-red.svg?logo=arxiv&logoColor=white" alt="arXiv论文：2601.02078" style="vertical-align: middle; margin-right: 24px; height: 25px;"> </a> <a href="https://agibot-world.agibot.com/genie-sim" style="text-decoration:none;"> <img src="https://img.shields.io/badge/Project%20Page-genie--sim-1976d2?logo=githubpages&logoColor=white" alt="项目主页" style="vertical-align: middle; margin-right: 24px; height: 25px;"> </a> <a href="https://github.com/AgibotTech/genie_sim" style="text-decoration:none;"> <img src="https://img.shields.io/badge/Code-GitHub-181717?logo=github&logoColor=white" alt="代码仓库" style="vertical-align: middle; margin-right: 24px; height: 25px;"> </a> <a href="https://modelscope.cn/datasets/agibot_world/GenieSim3.0-Dataset" style="text-decoration:none;"> <img src="https://img.shields.io/badge/Dataset-ModelScope-FF6B35?logo=model&logoColor=white" alt="ModelScope数据集" style="vertical-align: middle; height: 25px;"> </a> </div> <strong>GenieSim 3.0 数据集</strong>是基于Isaac Sim构建的大规模机器人操作仿真数据集，涵盖多样的操作任务场景，为具身智能（embodied intelligence）研究提供高质量训练数据。 ## 📖 数据集介绍 **GenieSim 3.0 数据集**是当前具身智能领域规模最大的开源仿真数据集。本项目开源了超过**10000小时**的跨200+任务的仿真数据，提供包含RGB-D、立体视觉以及全身运动学信息在内的多传感器数据流，同时支持场景布局、传感器噪声与光照条件的多维度变化。任务复杂度按层级进行组织，可实现从简单到复杂场景的结构化进阶学习。其核心设计原则为**可组合性**：长时序任务可拆解为数据集内包含的基础子任务序列。为此，我们从三大核心维度构建任务分类体系：**操作技能**、**认知理解**与**任务复杂度**。针对领域内长期存在的仿真-真实鸿沟（sim-to-real gap）这一关键痛点，我们开展了全面的仿真到真实域验证工作。使用合成数据训练的模型可实现**零样本（zero-shot）仿真到真实域迁移**，且相较于使用真实数据训练的模型拥有更优的任务成功率，充分证明了本仿真数据集在实际机器人应用中的实用价值。 ## 🔑 核心特性 - **大规模仿真数据**：覆盖200+操作任务的超10000小时高质量仿真演示数据 - **多传感器数据流**：包含RGB-D相机、立体视觉与全身运动学数据 - **丰富的变化性**：支持场景布局、传感器噪声与光照条件的多维度自定义变化 - **层级化任务组织**：任务按从简单到复杂的顺序排布，支持渐进式学习 - **可组合任务结构**：长时序任务可拆解为基础子任务序列 - **三维任务分类体系**：从操作技能、认知理解与任务复杂度三大维度构建分类结构 - **验证有效的仿真到真实域迁移**：具备零样本迁移能力，相较于真实数据训练的模型表现更优 ## 📁 数据集结构 ### 文件夹层级结构本数据集仓库顶层包含三个主要目录： GenieSim3.0-Dataset/ ├── dataset/ # 主数据集目录 │ ├── heat_food_microwave/ # 加热微波炉中的食物（任务名称） │ │ ├── g2/ # 机器人型号 │ │ │ ├── data/ # 回合数据文件 │ │ │ │ ├── chunk-000/ # 数据块0 │ │ │ │ │ ├── episode_000000.parquet │ │ │ │ │ ├── episode_000001.parquet │ │ │ │ │ ├── episode_000002.parquet │ │ │ │ │ └── ... │ │ │ │ ├── chunk-001/ # 数据块1 │ │ │ │ │ ├── episode_001000.parquet │ │ │ │ │ ├── episode_001001.parquet │ │ │ │ │ └── ... │ │ │ │ └── ... │ │ │ ├── meta/ # 元数据文件 │ │ │ │ ├── info.json # 数据集信息与特征定义 │ │ │ │ ├── episodes.jsonl # 包含任务、时长等信息的回合元数据 │ │ │ │ ├── episodes_stats.jsonl # 单回合统计信息文件 │ │ │ │ └── tasks.jsonl # 任务定义文件 │ │ │ └── videos/ # 视频录制文件 │ │ │ ├── chunk-000/ # 视频块0 │ │ │ │ ├── observation.images.top_head/ # 顶置摄像头视频 │ │ │ │ │ ├── episode_000000.mp4 │ │ │ │ │ ├── episode_000001.mp4 │ │ │ │ │ └── ... │ │ │ │ ├── observation.images.hand_left/ # 左手摄像头视频 │ │ │ │ │ └── ... │ │ │ │ ├── observation.images.hand_right/ # 右手摄像头视频 │ │ │ │ │ └── ... │ │ │ │ ├── observation.images.head_depth/ # 头部深度相机视频 │ │ │ │ │ └── ... │ │ │ │ ├── observation.images.hand_left_depth/ # 左手深度相机视频 │ │ │ │ │ └── ... │ │ │ │ └── observation.images.hand_right_depth/ # 右手深度相机视频 │ │ │ │ └── ... │ │ │ ├── chunk-001/ # 视频块1 │ │ │ │ ├── observation.images.top_head/ │ │ │ │ │ └── ... │ │ │ │ └── ... │ │ │ └── ... │ │ └── ... # 其他机器人型号 │ ├── organize_kitchen_utensils/ # 整理厨房餐具（任务名称） │ │ ├── g2/ # 机器人型号 │ │ │ ├── data/ │ │ │ ├── meta/ │ │ │ └── videos/ │ │ └── ... │ └── ... # 其他任务 ├── checkpoints/ # 预训练模型 checkpoint 目录 │ └── ... # 基于仿真数据训练的PI 0.5模型 └── reconstruction_source_data/ # 3D高斯溅射（3D Gaussian Splatting，简称3DGS）重建源数据目录 └── ... # 用于3DGS重建的原始数据本数据集遵循[LeRobot](https://github.com/huggingface/lerobot) v0.3.3格式标准，该标准为机器人操作数据集提供了统一的组织结构。在`dataset/`目录下，数据按任务名称与机器人型号进行组织，每个任务-机器人组合包含Parquet格式的回合数据、元数据文件（info.json、episodes.jsonl、episodes_stats.jsonl与tasks.jsonl），以及按数据块与相机视角分类的视频录制文件。本仓库还包含以下内容： - **checkpoints/**：基于本仿真数据训练的预训练PI 0.5模型。这些模型即论文中提及的可实现**零样本（zero-shot）仿真到真实域迁移**且相较于真实数据训练模型拥有更优任务成功率的模型。GenieSim 3.0所采用的大规模、系统化随机化合成数据流水线，为真实机器人数据采集提供了一种务实且可扩展的替代方案，可训练出在物理环境中具备优异零样本表现的鲁棒模型。 - **reconstruction_source_data/**：用于3D高斯溅射（3DGS）重建的原始数据 ## 📄 许可与引用本仓库内的所有数据与代码均遵循[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)许可协议。若本项目对您的研究有所帮助，请考虑引用我们的工作： BibTeX @misc{yin2026geniesim30, title={Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot}, author={Chenghao Yin and Da Huang and Di Yang and Jichao Wang and Nanshu Zhao and Chen Xu and Wenjun Sun and Linjie Hou and Zhijun Li and Junhui Wu and Zhaobo Liu and Zhen Xiao and Sheng Zhang and Lei Bao and Rui Feng and Zhenquan Pang and Jiayu Li and Qian Wang and Maoqing Yao}, year={2026}, eprint={2601.02078}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2601.02078}, }

提供机构：

maas

创建时间：

2026-01-05

搜集汇总

数据集介绍

背景与挑战

背景概述

GenieSim 3.0 Dataset是一个基于Isaac Sim构建的大规模机器人操作模拟数据集，专为具身智能研究设计，提供超过10,000小时的模拟数据，覆盖200多个任务，并包含多传感器流和丰富的环境变体。其关键特点包括层次化任务组织、可组合任务结构以及经过验证的零样本模拟到真实转移能力，性能优于真实数据训练，适用于实际机器人应用。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集