下载链接：

https://modelscope.cn/datasets/nv-community/robocasa365-datasets

下载链接

链接失效反馈

官方服务：

资源简介：

# Overview of Datasets RoboCasa offers over 2,200 hours of demonstration data, comprising human teleoperation data and synthetic data. Broadly, the data is split into **pretraining datasets** and **target datasets**. The pretraining datasets feature 300 diverse tasks across 2,500 pretraining kitchens, while the target datasets feature 50 target tasks across a distinct set of 10 heldout target kitchens. <table class="docutils rc-datasets-summary"> <caption>Dataset statistics across pretraining and target settings.</caption> <thead> <tr> <th>Setting</th> <th>Num Tasks</th> <th>Num Scenes</th> <th>Demos per Task</th> <th>Dataset Size (hrs)</th> </tr> </thead> <tbody> <tr> <td>Pretraining (Human)</td> <td>300</td> <td>2500</td> <td>100</td> <td>482</td> </tr> <tr> <td>Pretraining (MimicGen - Coming Soon!)</td> <td>60</td> <td>2500</td> <td>10,000</td> <td>1615</td> </tr> <tr> <td>Target (Human)</td> <td>50</td> <td>10</td> <td>500</td> <td>193</td> </tr> </tbody> </table> We provide a detailed overview of the pretraining and target datasets below. ------- ## Pretraining Datasets RoboCasa offers ~2,000 hours of pretraining demonstration data. The pretraining datasets feature 300 diverse tasks across 2500 pretraining kitchens. We feature both human and sythentic datasets: ### Human Datasets 482 hours of data collected via teleoperation. The data spans 300 tasks (65 atomic tasks and 235 composite tasks), with 100 demonstrations per task. ### Synthetic Datasets (Coming Soon!) 1615 hours of data generated via [MimicGen](https://mimicgen.github.io/). The data spans 60 atomic tasks, with ~10k demonstrations per task. The repository currently does not store the MimicGen dataset. These will be added in the coming weeks. ------- ## Target Datasets In addition to pretraining data, RoboCasa offers over 193 hours of high-quality demonstration data for target tasks collected via teleoperation. The target datasets feature 50 diverse tasks across 10 distinct target kitchen scenes. Note that these target scenes are distinct from the pretraining scenes represented in the pretraining datasets. For each task, we provide **500 human demonstrations** collected via teleoperation. We split these datasets into three groups: * **Atomic-Seen** (18 tasks): 18 atomic tasks, with all tasks also represented in pretraining datasets. * **Composite-Seen** (16 tasks): 16 composite tasks, with all tasks also represented in pretraining datasets. * **Composite-Unseen** (16 tasks): 16 composite tasks, only seen in target datasets and not in pretraining datasets. # Using Datasets We provide datasets in the lerobot format. There are broadly three types of datasets: **pretraining (human)** datasets, **pretraining (MimicGen)** datasets, and **target (human)** datasets. ### Downloading datasets Here are a few examples to download datasets: <details> <summary><b>Click to expand download examples</b></summary> ``` # downloads all datasets python -m robocasa.scripts.download_datasets --all # only download pretraining human data python -m robocasa.scripts.download_datasets --split pretrain --source human # only download pretraining MimicGen data python -m robocasa.scripts.download_datasets --split pretrain --source mimicgen # only download target human data python -m robocasa.scripts.download_datasets --split target --source human # download all datasets for specific task(s) python -m robocasa.scripts.download_datasets --tasks PickPlaceCounterToCabinet ArrangeBreadBasket ``` You can specify `--overwrite` to overwrite existing datasets. </details> ### Dataset structure RoboCasa datasets follow the LeRobot format. Here is an overview of important elements of each dataset: <details> <summary><b>Click to expand dataset structure</b></summary> ``` lerobot/ ├── meta/ # Metadata files describing the dataset │ ├── info.json # Dataset info (robot type, episodes, frames, fps, features) │ ├── tasks.jsonl # Language instructions with task indices │ ├── episodes.jsonl # Per-episode metadata (index, instruction, length) │ ├── episodes_stats.jsonl # Per-episode statistics for actions/proprioception │ ├── stats.json # Aggregated statistics across all episodes │ ├── modality.json # Info contained in observations and action vectors │ └── embodiment.json # Embodiment information │ ├── data/ # Low-dimensional trajectory data (parquet files) │ └── chunk-<chunk_id>/ │ └── episode_<episode_id>.parquet # Proprioception, actions, dones, timestamps │ ├── videos/ # MP4 video files for each camera view │ └── chunk-<chunk_id>/ │ ├── observation.images.robot0_agentview_left/ │ │ └── episode_<episode_id>.mp4 # Left third-person camera │ ├── observation.images.robot0_agentview_right/ │ │ └── episode_<episode_id>.mp4 # Right third-person camera │ └── observation.images.robot0_eye_in_hand/ │ └── episode_<episode_id>.mp4 # Eye-in-hand camera │ └── extras/ # MuJoCo/RoboCasa-specific metadata (non-standard) ├── dataset_meta.json # Environment args and controller configs └── episode_<episode_id>/ # Per-episode extras ├── ep_meta.json # Episode metadata (layout, style, fixtures, objects) ├── model.xml.gz # Compressed MJCF MuJoCo model XML └── states.npz # Raw MuJoCo states for replay (not for training) ``` </details> ### Retrieving dataset metadata We track each dataset with metadata (paths, task horizon length, etc.) in the [dataset registry](https://github.com/robocasa/robocasa-dev/blob/dev/robocasa/utils/dataset_registry.py). You can use the `get_ds_meta()` function to retrieve metadata for a specific task: ```py from robocasa.utils.dataset_registry import get_ds_meta ds_meta = get_ds_meta( task="PickPlaceCounterToCabinet", split="target", # or try "pretrain" source="human", # defaults to "human", try "mimicgen" for synthetic data demo_fraction=1.0, # the fraction of available demos to use (default is 1.0) ) ``` ### Creating datasets for training Here is an example script to access dataset elements: ```py from lerobot.datasets.lerobot_dataset import LeRobotDataset import random # get dataset path from ds_meta from previous section dataset_path = ds_meta["path"] ds = LeRobotDataset(repo_id="robocasa365", root=dataset_path) ep_idx = 5 start = int(ds.episode_data_index["from"][ep_idx]) end = int(ds.episode_data_index["to"][ep_idx]) timestep_idx = random.randint(0, end - start) sample = ds[start + timestep_idx] # Accessing a random sample from the 5th demo in the dataset right_img = sample["observation.images.robot0_agentview_right"] # Accessing the right camera image action = sample["action"] # Accessing the action taken instruction = sample["task"] # Accessing the instruction for the episode ``` ### Training beyond a single dataset The code above returns meta data for a single dataset. You can retrieve information for a collection of datasets using the `get_ds_soup()` function, which returns a list of dataset metadata: ```py from robocasa.utils.dataset_registry import get_ds_soup ds_soup = get_ds_soup( task_soup="atomic_seen", # the list of tasks split="target", # or try "pretrain" source="human", # defaults to "human", try "mimicgen" for synthetic data demo_fraction=1.0, # the fraction of available demos to use (default is 1.0) ) ``` Prominent dataset soups are registerd in [the dataset soup registry](https://github.com/robocasa/robocasa-dev/blob/dev/robocasa/utils/dataset_registry.py). To construct a combined dataset from multiple datasets with custom weights, you can re-use the dataloader from GR00T-N1.5 codebase: <details> <summary><b>Click to expand weighted dataset creation</b></summary> ```py import copy import os from dataclasses import dataclass import numpy as np from robocasa.utils.dataset_registry import DATASET_SOUP_REGISTRY from robocasa.utils.groot_utils.groot_dataset import LeRobotMixtureDataset, LeRobotSingleDataset, ModalityConfig from robocasa.utils.groot_utils.schema import EmbodimentTag embodiment_tag = EmbodimentTag("new_embodiment") # Define configs needed for dataloader to fetch correct data modality_configs = { "video": ModalityConfig( delta_indices=[0], modality_keys=[ "video.robot0_agentview_left", "video.robot0_agentview_right", "video.robot0_eye_in_hand", ], ), "state": ModalityConfig( delta_indices=[0], modality_keys=[ "state.end_effector_position_relative", "state.end_effector_rotation_relative", "state.gripper_qpos", "state.base_position", "state.base_rotation", ], ), "action": ModalityConfig( delta_indices=list(range(16)), modality_keys=[ "action.end_effector_position", "action.end_effector_rotation", "action.gripper_close", "action.base_motion", "action.control_mode", ], ), "language": ModalityConfig( delta_indices=[0], modality_keys=[ "annotation.human.task_description", ], ), } dataset_soup = "target_atomic_seen" # specify which dataset soup to use ds_soup_list = copy.deepcopy(DATASET_SOUP_REGISTRY[dataset_soup]) single_datasets = [] for ds_meta in ds_soup_list: ds_path = ds_meta["path"] ds_filter_key = ds_meta["filter_key"] assert os.path.exists(ds_path), f"Dataset path {ds_path} does not exist" dataset = LeRobotSingleDataset( dataset_path=ds_path, modality_configs=modality_configs, embodiment_tag=embodiment_tag, filter_key=ds_filter_key, ) single_datasets.append(dataset) ds_weights = np.ones(len(single_datasets)) # custom weights for datasets print("dataset weights:", ds_weights) train_dataset = LeRobotMixtureDataset( data_mixture=[ (dataset, ds_w) for dataset, ds_w in zip(single_datasets, ds_weights) ], mode="train" ) for item in train_dataset: print(item) break ``` </details> ### Inspecting and visualizing datasets To get dataset statistics (filter keys, objects, task language, scenes): ``` python robocasa/scripts/get_dataset_info.py --dataset <ds-path> ``` You can visualize dataset videos by looking at the `videos` folder under each lerobot dataset directory. To visualize a dataset and save a video: ``` python robocasa/scripts/playback_dataset.py --n 10 --dataset <ds-path> ``` This will save a video of 10 random demonstrations in the same path as the dataset. You can play the full dataset by removing the `--n` flag. For more information about robocasa, please visit our [documentation site](https://robocasa.ai/)

# 数据集概览 RoboCasa 提供超过2200小时的演示数据，涵盖人类遥操作（teleoperation）数据与合成数据。总体而言，该数据集被划分为**预训练数据集（pretraining datasets）**与**目标数据集（target datasets）**两类。预训练数据集涵盖2500个预训练厨房场景中的300项多样化任务，而目标数据集则涵盖10个独立留出的目标厨房场景中的50项目标任务。 <table class="docutils rc-datasets-summary"> <caption>预训练与目标设置下的数据集统计信息</caption> <thead> <tr> <th>设置类型</th> <th>任务数量</th> <th>场景数量</th> <th>单任务演示数</th> <th>数据集规模（小时）</th> </tr> </thead> <tbody> <tr> <td>预训练（人类遥操作）</td> <td>300</td> <td>2500</td> <td>100</td> <td>482</td> </tr> <tr> <td>预训练（MimicGen - 即将上线！）</td> <td>60</td> <td>2500</td> <td>10000</td> <td>1615</td> </tr> <tr> <td>目标（人类遥操作）</td> <td>50</td> <td>10</td> <td>500</td> <td>193</td> </tr> </tbody> </table> 下文将详细介绍预训练数据集与目标数据集。 ------- ## 预训练数据集 RoboCasa 提供约2000小时的预训练演示数据。预训练数据集涵盖2500个预训练厨房场景中的300项多样化任务，包含人类遥操作数据集与合成数据集两类： ### 人类遥操作数据集该数据集包含482小时的遥操作采集数据，涵盖300项任务（65项原子任务与235项复合任务），单任务对应100条演示数据。 ### 合成数据集（即将上线！）该数据集包含通过[MimicGen](https://mimicgen.github.io/)生成的1615小时数据，涵盖60项原子任务，单任务对应约1万条演示数据。当前代码仓库尚未存储该MimicGen数据集，相关内容将在未来几周内补充。 ------- ## 目标数据集除预训练数据外，RoboCasa还提供了超过193小时的高质量目标任务遥操作演示数据。目标数据集涵盖10个独立的目标厨房场景中的50项多样化任务。请注意，这些目标场景与预训练数据集中的预训练场景完全独立。针对每项任务，我们提供了**500条人类遥操作演示数据**。我们将该数据集划分为三个组别： * **可见原子任务（Atomic-Seen）**（18项任务）：共18项原子任务，所有任务均已在预训练数据集中出现。 * **可见复合任务（Composite-Seen）**（16项任务）：共16项复合任务，所有任务均已在预训练数据集中出现。 * **不可见复合任务（Composite-Unseen）**（16项任务）：共16项复合任务，仅在目标数据集中出现，未出现在预训练数据集中。 # 数据集使用方法我们采用LeRobot格式（lerobot format）存储数据集，总体分为三类：**预训练（人类遥操作）**数据集、**预训练（MimicGen）**数据集与**目标（人类遥操作）**数据集。 ## 数据集下载以下为若干数据集下载示例： <details> <summary><b>点击展开下载示例</b></summary> # 下载所有数据集 python -m robocasa.scripts.download_datasets --all # 仅下载预训练人类遥操作数据 python -m robocasa.scripts.download_datasets --split pretrain --source human # 仅下载预训练MimicGen数据 python -m robocasa.scripts.download_datasets --split pretrain --source mimicgen # 仅下载目标人类遥操作数据 python -m robocasa.scripts.download_datasets --split target --source human # 下载指定单个或多个任务的数据集 python -m robocasa.scripts.download_datasets --tasks PickPlaceCounterToCabinet ArrangeBreadBasket 你可以通过添加`--overwrite`参数覆盖已有的本地数据集。 </details> ## 数据集结构 RoboCasa数据集遵循LeRobot格式，以下为各数据集的核心组成部分概览： <details> <summary><b>点击展开数据集结构</b></summary> lerobot/ ├── meta/ # 描述数据集的元数据文件 │ ├── info.json # 数据集信息（机器人类型、轨迹数、帧数、帧率、特征类型） │ ├── tasks.jsonl # 包含任务索引的语言指令文件 │ ├── episodes.jsonl # 单轨迹片段元数据（索引、指令、长度） │ ├── episodes_stats.jsonl # 单轨迹片段的动作/本体感知统计信息 │ ├── stats.json # 所有轨迹片段的聚合统计信息 │ ├── modality.json # 观测与动作向量包含的信息 │ └── embodiment.json # 智能体形态信息 │ ├── data/ # 低维轨迹数据（parquet文件） │ └── chunk-<chunk_id>/ │ └── episode_<episode_id>.parquet # 本体感知、动作、终止标志、时间戳 │ ├── videos/ # 各相机视角的MP4视频文件 │ └── chunk-<chunk_id>/ │ ├── observation.images.robot0_agentview_left/ │ │ └── episode_<episode_id>.mp4 # 左侧第三人称相机 │ ├── observation.images.robot0_agentview_right/ │ │ └── episode_<episode_id>.mp4 # 右侧第三人称相机 │ └── observation.images.robot0_eye_in_hand/ │ └── episode_<episode_id>.mp4 # 眼在手上相机 │ └── extras/ # MuJoCo/RoboCasa专属元数据（非标准格式） ├── dataset_meta.json # 环境参数与控制器配置 └── episode_<episode_id>/ # 单轨迹片段附加信息 ├── ep_meta.json # 轨迹元数据（布局、风格、 fixture、物体） ├── model.xml.gz # 压缩的MJCF MuJoCo模型XML文件 └── states.npz # 用于回放的原始MuJoCo状态（不用于训练） </details> ## 数据集元数据获取我们通过[数据集注册表](https://github.com/robocasa/robocasa-dev/blob/dev/robocasa/utils/dataset_registry.py)存储各数据集的元数据（文件路径、任务时域长度等），可通过`get_ds_meta()`函数获取指定任务的元数据： py from robocasa.utils.dataset_registry import get_ds_meta ds_meta = get_ds_meta( task="PickPlaceCounterToCabinet", split="target", # 或尝试取值为"pretrain" source="human", # 默认为"human"，合成数据可尝试取值为"mimicgen" demo_fraction=1.0, # 可用演示数据的占比（默认值为1.0） ) ## 训练用数据集构建以下为访问数据集元素的示例脚本： py from lerobot.datasets.lerobot_dataset import LeRobotDataset import random # 从前述步骤获取的ds_meta中获取数据集路径 dataset_path = ds_meta["path"] ds = LeRobotDataset(repo_id="robocasa365", root=dataset_path) ep_idx = 5 start = int(ds.episode_data_index["from"][ep_idx]) end = int(ds.episode_data_index["to"][ep_idx]) timestep_idx = random.randint(0, end - start) sample = ds[start + timestep_idx] # 访问数据集中第5条演示的随机采样数据 right_img = sample["observation.images.robot0_agentview_right"] # 访问右侧相机图像 action = sample["action"] # 访问对应的动作数据 instruction = sample["task"] # 访问该轨迹的任务指令 ## 多数据集联合训练上述代码仅返回单个数据集的元数据，若需获取多个数据集的集合信息，可使用`get_ds_soup()`函数，该函数将返回数据集元数据列表： py from robocasa.utils.dataset_registry import get_ds_soup ds_soup = get_ds_soup( task_soup="atomic_seen", # 任务组合列表 split="target", # 或尝试取值为"pretrain" source="human", # 默认为"human"，合成数据可尝试取值为"mimicgen" demo_fraction=1.0, # 可用演示数据的占比（默认值为1.0） ) 常用的数据集组合已在[数据集组合注册表](https://github.com/robocasa/robocasa-dev/blob/dev/robocasa/utils/dataset_registry.py)中完成注册。若需基于自定义权重组合多个数据集以构建混合训练集，可复用GR00T-N1.5代码库中的数据加载器： <details> <summary><b>点击展开加权混合数据集构建示例</b></summary> py import copy import os from dataclasses import dataclass import numpy as np from robocasa.utils.dataset_registry import DATASET_SOUP_REGISTRY from robocasa.utils.groot_utils.groot_dataset import LeRobotMixtureDataset, LeRobotSingleDataset, ModalityConfig from robocasa.utils.groot_utils.schema import EmbodimentTag embodiment_tag = EmbodimentTag("new_embodiment") # 定义数据加载器所需的配置以获取正确的数据 modality_configs = { "video": ModalityConfig( delta_indices=[0], modality_keys=[ "video.robot0_agentview_left", "video.robot0_agentview_right", "video.robot0_eye_in_hand", ], ), "state": ModalityConfig( delta_indices=[0], modality_keys=[ "state.end_effector_position_relative", "state.end_effector_rotation_relative", "state.gripper_qpos", "state.base_position", "state.base_rotation", ], ), "action": ModalityConfig( delta_indices=list(range(16)), modality_keys=[ "action.end_effector_position", "action.end_effector_rotation", "action.gripper_close", "action.base_motion", "action.control_mode", ], ), "language": ModalityConfig( delta_indices=[0], modality_keys=[ "annotation.human.task_description", ], ), } dataset_soup = "target_atomic_seen" # 指定要使用的数据集组合 ds_soup_list = copy.deepcopy(DATASET_SOUP_REGISTRY[dataset_soup]) single_datasets = [] for ds_meta in ds_soup_list: ds_path = ds_meta["path"] ds_filter_key = ds_meta["filter_key"] assert os.path.exists(ds_path), f"Dataset path {ds_path} does not exist" dataset = LeRobotSingleDataset( dataset_path=ds_path, modality_configs=modality_configs, embodiment_tag=embodiment_tag, filter_key=ds_filter_key, ) single_datasets.append(dataset) ds_weights = np.ones(len(single_datasets)) # 自定义数据集权重 print("dataset weights:", ds_weights) train_dataset = LeRobotMixtureDataset( data_mixture=[ (dataset, ds_w) for dataset, ds_w in zip(single_datasets, ds_weights) ], mode="train" ) for item in train_dataset: print(item) break </details> ## 数据集检查与可视化若需获取数据集统计信息（筛选键、场景物体、任务语言描述、场景信息）： python robocasa/scripts/get_dataset_info.py --dataset <ds-path> 你可通过查看各LeRobot数据集目录下的`videos`文件夹可视化数据集视频。若需可视化数据集并保存视频： python robocasa/scripts/playback_dataset.py --n 10 --dataset <ds-path> 该命令将在数据集所在路径下保存包含10条随机演示片段的视频。若需可视化完整数据集，可移除`--n`参数。如需了解RoboCasa的更多信息，请访问我们的[官方文档站点](https://robocasa.ai/)

应用场景：