Galaxea-Open-World-Dataset
收藏魔搭社区2026-05-19 更新2025-08-30 收录
下载链接:
https://modelscope.cn/datasets/Galaxea/Galaxea-Open-World-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
# Galaxea Open-World Dataset
[](https://opengalaxea.github.io/G0/)
[](https://arxiv.org/abs/2509.00576)
[](https://opengalaxea.github.io/G0/)
[](https://opengalaxea.github.io/G0/visualizer/index.html)
[](https://www.modelscope.cn/organization/Galaxea)
[](https://x.com/Galaxea_x)
[](https://www.linkedin.com/company/galaxeadynamics/posts/?feedView=all&viewAsMember=true)
[](https://discord.gg/hB6BuUWZZA)
## Key Features
- **500+ hours** of real-world mobile manipulation data.
- All data collected using **one uniform robotic embodiment** (R1-Lite) for consistency.
- Fine-grained **subtask language annotations** (bilingual Chinese/English).
- Covers **residential**, **kitchen**, **retail**, and **office** settings.
- Dataset in **LeRobot v2.1** format.
## Dataset Structure
The dataset is organized as **227 task-level tar.gz archives** under the `lerobot/` directory. Each archive contains a self-contained LeRobot dataset for one task:
```
lerobot/
├── Adjust_The_Air_Conditioner_Temperature_20250711_006.tar.gz
├── Arrange_Fruits_20250819_011.tar.gz
├── Clean_The_Mirror_20250714_006.tar.gz
├── ...
└── Wipe_The_Sewage_Stains_With_A_Ground_Cloth_20250801_012.tar.gz
```
After extracting a task archive, the directory follows the standard LeRobot v2.1 layout:
```
<task_name>/
├── meta/
│ ├── info.json # Dataset metadata (features, fps, splits, etc.)
│ ├── episodes.jsonl # Per-episode info (tasks, length, source file)
│ └── episodes_stats.jsonl # Per-episode statistics
├── data/
│ └── chunk-000/
│ ├── episode_000000.parquet
│ ├── episode_000001.parquet
│ └── ...
└── videos/
└── chunk-000/
├── observation.images.head_rgb/
│ ├── episode_000000.mp4
│ └── ...
├── observation.images.head_right_rgb/
├── observation.images.left_wrist_rgb/
└── observation.images.right_wrist_rgb/
```
## LeRobot Dataset Schema
A detailed schema is available in [lerobot_info.json](https://huggingface.co/datasets/OpenGalaxea/Galaxea-Open-World-Dataset/blob/main/lerobot_info.json). Below is a summary of all features:
### Observations
| Feature | Dtype | Shape | Description |
|---|---|---|---|
| `observation.images.head_rgb` | video | (720, 1280, 3) | Head camera RGB |
| `observation.images.head_right_rgb` | video | (720, 1280, 3) | Head right camera RGB |
| `observation.images.left_wrist_rgb` | video | (720, 1280, 3) | Left wrist camera RGB |
| `observation.images.right_wrist_rgb` | video | (720, 1280, 3) | Right wrist camera RGB |
| `observation.state.left_arm` | float64 | (6,) | Left arm joint positions |
| `observation.state.left_arm.velocities` | float64 | (6,) | Left arm joint velocities |
| `observation.state.right_arm` | float64 | (6,) | Right arm joint positions |
| `observation.state.right_arm.velocities` | float64 | (6,) | Right arm joint velocities |
| `observation.state.torso` | float64 | (4,) | Torso joint positions |
| `observation.state.torso.velocities` | float64 | (4,) | Torso joint velocities |
| `observation.state.chassis` | float64 | (3,) | Chassis positions |
| `observation.state.chassis.velocities` | float64 | (3,) | Chassis velocities |
| `observation.state.chassis.imu` | float64 | (10,) | Chassis IMU (orientation, angular velocity, linear acceleration) |
| `observation.state.left_gripper` | float64 | (1,) | Left gripper state (0-close, 100-open) |
| `observation.state.right_gripper` | float64 | (1,) | Right gripper state (0-close, 100-open) |
| `observation.state.left_ee_pose` | float64 | (7,) | Left end-effector pose (position + quaternion) |
| `observation.state.right_ee_pose` | float64 | (7,) | Right end-effector pose (position + quaternion) |
### Actions
| Feature | Dtype | Shape | Description |
|---|---|---|---|
| `action.left_arm` | float64 | (6,) | Target left arm joint positions |
| `action.right_arm` | float64 | (6,) | Target right arm joint positions |
| `action.left_gripper` | float64 | (1,) | Target left gripper position |
| `action.right_gripper` | float64 | (1,) | Target right gripper position |
| `action.chassis.velocities` | float64 | (6,) | Target chassis twist (linear + angular) |
| `action.torso.velocities` | float64 | (6,) | Target torso twist (linear + angular) |
### Metadata Columns
| Feature | Dtype | Description |
|---|---|---|
| `timestamp` | float32 | Timestamp within episode |
| `frame_index` | int64 | Frame index within episode |
| `episode_index` | int64 | Episode index |
| `index` | int64 | Global frame index |
| `task_index` | int64 | Subtask annotation index |
| `coarse_task_index` | int64 | Coarse-level task index |
| `quality_index` | int64 | Quality annotation index |
| `coarse_quality_index` | int64 | Coarse-level quality index |
All video streams are encoded with **AV1 codec** at **15 fps**.
## Quick Start
Below is an example of how to download and load a single task from the dataset using the Hugging Face Hub and LeRobot:
```python
import tarfile
from pathlib import Path
from huggingface_hub import hf_hub_download
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
import torch
# 1. Download a single task archive
task_name = "Clean_The_Mirror_20250714_006"
tar_path = hf_hub_download(
repo_id="OpenGalaxea/Galaxea-Open-World-Dataset",
filename=f"lerobot/{task_name}.tar.gz",
repo_type="dataset",
)
# 2. Extract it
extract_dir = Path("./galaxea_data")
extract_dir.mkdir(exist_ok=True)
with tarfile.open(tar_path, "r:gz") as tar:
tar.extractall(path=extract_dir)
# 3. Load with LeRobot
dataset = LeRobotDataset.from_local(str(extract_dir / task_name / task_name))
print(f"Number of episodes: {dataset.num_episodes}")
print(f"Number of frames: {dataset.num_frames}")
print(f"FPS: {dataset.fps}")
# 4. Access a single frame
frame = dataset[0]
print(f"\nFrame keys: {list(frame.keys())}")
print(f"Left arm state shape: {frame['observation.state.left_arm'].shape}")
print(f"Left arm action shape: {frame['action.left_arm'].shape}")
# 5. Iterate over an episode
from torch.utils.data import DataLoader
dataloader = DataLoader(dataset, batch_size=32, shuffle=False)
for batch in dataloader:
print(f"Batch observation.state.left_arm shape: {batch['observation.state.left_arm'].shape}")
break
```
## Citation
All the data and code within this repo are under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). If you use our dataset or models, please cite:
```bibtex
@article{galaxea2025,
title={Galaxea G0: Open-World Dataset and Dual-System VLA Model},
author={Galaxea Team},
journal={arXiv preprint arXiv:2509.00576},
year={2025}
}
```
# 🚀 Galaxea 开放世界数据集
[](https://opengalaxea.github.io/G0/)
[](https://arxiv.org/abs/2509.00576)
[](https://opengalaxea.github.io/G0/static/videos/demo.webm)
[](https://opengalaxea.github.io/G0/visualizer/index.html)
[](https://huggingface.co/OpenGalaxea)
[](https://x.com/Galaxea_x)
[](https://www.linkedin.com/company/galaxeadynamics/posts/?feedView=all&viewAsMember=true)
[](https://discord.gg/hB6BuUWZZA)
# ❗ **重要提示:**
- `lerobot` 文件夹下的Lerobot格式数据集未包含躯干动作,请使用`lerobot_opensource`文件夹中的数据集。
## 核心特性
- **累计500+小时**真实世界移动操作数据。
- 所有数据均采用**统一的机器人硬件构型**采集,确保数据一致性。
- 包含细粒度**子任务语言标注**。
- 覆盖**住宅、厨房、零售、办公**四大场景。
- 数据集采用**RLDS**格式存储。
## 数据集结构
为方便使用,我们将500小时的数据按时间均分为四个部分,同时提供了小型样例数据集用于快速上手。
rlds
├── part1_r1_lite
│ ├── 1.0.0
│ │ ├── dataset_info.json
│ │ ├── features.json
│ │ ├── merge_dataset_large_r1_lite-train.tfrecord-00000-of-02048
│ │ ├── ...
│ │ ├── merge_dataset_large_r1_lite-train.tfrecord-02047-of-02048
├── part2_r1_lite
├── part3_r1_lite
├── part4_r1_lite
├── sample
│ ├── 1.0.0
│ │ ├── dataset_info.json
│ │ ├── features.json
│ │ ├── merge_dataset_large_r1_lite-train.tfrecord-00000-of-01024
│ │ ├── ...
│ │ ├── merge_dataset_large_r1_lite-train.tfrecord-01023-of-01024
## 数据集 Schema
OpenGalaxea数据集结构定义如下:
OpenGalaxeaDataset = {
"episode_metadata": {
"file_path": tf.Text, # 原始数据文件路径
},
"steps": {
"is_first": tf.Scalar(dtype=bool), # 标记当前步骤为剧集的首个步骤
"is_last": tf.Scalar(dtype=bool), # 标记当前步骤为剧集的末个步骤
"language_instruction": tf.Text, # 语言指令,格式为:"高级指令"@"中文低级指令"@"英文低级指令"
"observation": {
"base_velocity": tf.Tensor(3, dtype=float32), # 机器人底座速度
"gripper_state_left": tf.Tensor(1, dtype=float32), # 左夹爪状态,0为闭合,100为完全张开
"gripper_state_right": tf.Tensor(1, dtype=float32), # 右夹爪状态,0为闭合,100为完全张开
"depth_camera_wrist_left": tf.Tensor(224, 224, 1, dtype=uint16), # 左侧腕部深度相机视角图像,单位:毫米
"depth_camera_wrist_right": tf.Tensor(224, 224, 1, dtype=uint16), # 右侧腕部深度相机视角图像,单位:毫米
"image_camera_head": tf.Tensor(224, 224, 3, dtype=uint8), # 头部相机RGB视角图像
"image_camera_wrist_left": tf.Tensor(224, 224, 3, dtype=uint8), # 左侧腕部RGB相机视角图像
"image_camera_wrist_right": tf.Tensor(224, 224, 3, dtype=uint8), # 右侧腕部RGB相机视角图像
"joint_position_arm_left": tf.Tensor(6, dtype=float32), # 左机械臂关节位置
"joint_position_arm_right": tf.Tensor(6, dtype=float32), # 右机械臂关节位置
"joint_position_torso": tf.Tensor(4, dtype=float32), # 躯干关节位置
"joint_velocity_arm_left": tf.Tensor(6, dtype=float32), # 左机械臂关节速度
"joint_velocity_arm_right": tf.Tensor(6, dtype=float32), # 右机械臂关节速度
"last_action": tf.Tensor(26, dtype=float32), # 上一步动作历史
},
# 动作维度说明:26 = 6(左机械臂关节) + 1(左夹爪) + 6(右机械臂关节) + 1(右夹爪) + 6(躯干关节) + 6(底座运动参数)
"action": tf.Tensor(26, dtype=float32), # 机器人动作,维度由左机械臂6关节速度、左夹爪位置、右机械臂6关节速度、右夹爪位置、躯干6关节速度、底座6维运动参数组合而成
"segment_idx": tf.Scalar(dtype=int32), # 剧集内片段索引
"variant_idx": tf.Scalar(dtype=int32),
},
}
## 示例
我们提供了示例脚本,用于加载本RLDS数据集并将部分剧集转换为MP4视频格式(头部相机视角)。
python
import tensorflow_datasets as tfds
import tyro
import os
import imageio
from tqdm import tqdm
def main(
dataset_name: str,
data_dir: str,
output_dir: str = "extracted_videos",
num_trajs: int = 10
):
ds = tfds.load(dataset_name, split='train', data_dir=data_dir)
print(f"成功加载数据集:{dataset_name}")
os.makedirs(output_dir, exist_ok=True)
print(f"视频将保存至:{output_dir}")
for i, episode in enumerate(tqdm(ds.take(num_trajs), total=num_trajs, desc="导出视频中")):
head_frames = []
for step in episode['steps']:
head_rgb_image = step['observation']['image_camera_head'].numpy()
head_frames.append(head_rgb_image)
instruction = step['language_instruction'].numpy().decode('utf-8')
video_path = os.path.join(output_dir, f"traj_{i}_head_rgb.mp4")
try:
imageio.mimsave(video_path, head_frames, fps=15)
print(f"已保存剧集{i}的视频至{video_path},对应指令:'{instruction}'")
except Exception as e:
print(f"保存剧集{i}的视频时出错:{e}")
if __name__ == '__main__':
tyro.cli(main)
## 📜 引用声明
本仓库内的所有数据与代码均采用[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)协议开源。若您使用了本数据集或模型,请引用以下文献:
bibtex
@article{galaxea2025,
title={Galaxea G0: Open-World Dataset and Dual-System VLA Model},
author={Galaxea Team},
journal={arXiv preprint arXiv:2509.00576},
year={2025}
}
提供机构:
maas
创建时间:
2025-08-26



