RoboCOIN/Agilex_Cobot_Magic_fold_short_sleeve_white
收藏Hugging Face2026-04-02 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/RoboCOIN/Agilex_Cobot_Magic_fold_short_sleeve_white
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- robotics
language:
- en
extra_gated_prompt: 'By accessing this dataset, you agree to cite the associated paper in your research/publications—see the "Citation" section for details. You agree to not use the dataset to conduct experiments that cause harm to human subjects.'
extra_gated_fields:
Company/Organization:
type: 'text'
description: 'e.g., "ETH Zurich", "Boston Dynamics", "Independent Researcher"'
Country:
type: 'country'
description: 'e.g., "Germany", "China", "United States"'
tags:
- RoboCOIN
- LeRobot
license: apache-2.0
configs:
- config_name: default
data_files: data/chunk-{id}/episode_{id}.parquet
---
# Agilex_Cobot_Magic_fold_short_sleeve_white
## Dataset Description
This dataset uses an extended format based on LeRobot and is fully compatible with LeRobot.
## Task Preview
<video src="videos/chunk-000/observation.images.cam_head_rgb/episode_000000.mp4" controls width="640"></video>
[View Video Directly](videos/chunk-000/observation.images.cam_head_rgb/episode_000000.mp4)
### Overview
- **Total Episodes:** 50
- **Total Frames:** 70709
- **FPS:** 30
- **Dataset Size:** 701.06 MB
- **Robot Name:** `Agilex_Cobot_Magic`
- **End-Effector Type:** `two_finger_gripper`
- **Teleoperation Type:** `Due to some reasons, this dataset temporarily cannot provide the teleoperation type information.`
- **Sensors:** `cam_head_rgb`,
`cam_left_wrist_rgb`,
`cam_right_wrist_rgb`
- **Camera Information:** cam_head_rgb;
cam_left_wrist_rgb;
cam_right_wrist_rgb
- **Scene:** `household->bedroom`
- **Objects:** `table(unknown)`,
`white_T-shirt(unknown)`,
`green_tray(unknown)`
- **Task Description:** use two grippers to fold the white short sleeve, and use the left claw to place the folded white short sleeve on the tray.
### Primary Task Instruction
> use two grippers to fold the white short sleeve, and use the left claw to place the folded white short sleeve on the tray.
### Robot Configuration
- **Robot Name:** `Agilex_Cobot_Magic`
- **Codebase Version:** `v2.1`
- **End-Effector Type:** `two_finger_gripper`
- **Teleoperation Type:** `Due to some reasons, this dataset temporarily cannot provide the teleoperation type information.`
## Scene and Objects
### Scene Type
`household->bedroom`
### Objects
- `table(unknown)`
- `white_T-shirt(unknown)`
- `green_tray(unknown)`
## Task Descriptions
- **Standardized Task Description:** `use two grippers to fold the white short sleeve, and use the left claw to place the folded white short sleeve on the tray.`
- **Operation Type:** `Due to some reasons, this dataset temporarily cannot provide the operation type information.`
- **Environment Type:** `Due to some reasons, this dataset temporarily cannot provide the environment type information.`
### Sub-Tasks
This dataset includes 11 distinct subtasks:
1. **Lift the white T-shirt with the left gripper** (Index: 0)
2. **Place the folded white T-shirt on the green tray with the left gripper** (Index: 1)
3. **Fold the white T-shirt from left to right with left gripper** (Index: 2)
4. **Fold the white T-shirt from right to left with right gripper** (Index: 3)
5. **Grasp the white T-shirt with the right gripper** (Index: 4)
6. **End** (Index: 5)
7. **Fold the white T-shirt downward with the left gripper** (Index: 6)
8. **Fold the white T-shirt downward with the right gripper** (Index: 7)
9. **Lift the white T-shirt with the right gripper** (Index: 8)
10. **Grasp the white T-shirt with the left gripper** (Index: 9)
11. **null** (Index: 10)
### Atomic Actions
- `grasp`
- `lift`
- `lower`
- `fold`
## Hardware and Sensors
### Sensors
- `cam_head_rgb`
- `cam_left_wrist_rgb`
- `cam_right_wrist_rgb`
### Camera Information
- `cam_head_rgb`: dtype=video, shape=480x640x3, resolution=640x480, codec=av1, pix_fmt=yuv420p
- `cam_left_wrist_rgb`: dtype=video, shape=480x640x3, resolution=640x480, codec=av1, pix_fmt=yuv420p
- `cam_right_wrist_rgb`: dtype=video, shape=480x640x3, resolution=640x480, codec=av1, pix_fmt=yuv420p
### Coordinate System
- **Definition:** `right-hand-frame`
### Dimensions & Units
- **Joint Rotation:** `radian`
- **End-Effector Rotation:** `radian`
- **End-Effector Translation:** `meter`
## Dataset Statistics
| Metric | Value |
|--------|-------|
| **Total Episodes** | 50 |
| **Total Frames** | 70709 |
| **Total Tasks** | 11 |
| **Total Videos** | 150 |
| **Total Chunks** | 1 |
| **Chunk Size** | 1000 |
| **FPS** | 30 |
| **State Dimensions** | 26 |
| **Action Dimensions** | 26 |
| **Camera Views** | 3 |
| **Dataset Size** | 701.06 MB |
## Data Splits
The dataset is organized into the following splits:
- **Training**: Episodes 0:49
## Dataset Structure
This dataset follows the LeRobot format and contains the following components:
### Data Files
- **Videos**: Compressed video files containing RGB camera observations
- **State Data**: Robot joint positions, velocities, and other state information
- **Action Data**: Robot action commands and trajectories
- **Metadata**: Episode metadata, timestamps, and annotations
### File Organization
- **Data Path Pattern**: `data/chunk-{id}/episode_{id}.parquet`
- **Video Path Pattern**: `videos/chunk-{id}/observation.images.cam_left_wrist_rgb/episode_{id}.mp{id}`
- **Chunking**: Data is organized into 1 chunk(s)
of size 1000
### Data Structure (Tree)
```
Agilex_Cobot_Magic_fold_short_sleeve_white_qced_hardlink/
|-- annotations
| |-- eef_acc_mag_annotation.jsonl
| |-- eef_direction_annotation.jsonl
| |-- eef_velocity_annotation.jsonl
| |-- gripper_activity_annotation.jsonl
| |-- gripper_mode_annotation.jsonl
| |-- scene_annotations.jsonl
| `-- subtask_annotations.jsonl
|-- data
| `-- chunk-000
| |-- episode_000000.parquet
| |-- episode_000001.parquet
| |-- episode_000002.parquet
| |-- episode_000003.parquet
| |-- episode_000004.parquet
| |-- episode_000005.parquet
| |-- episode_000006.parquet
| |-- episode_000007.parquet
| |-- episode_000008.parquet
| |-- episode_000009.parquet
| |-- episode_000010.parquet
| `-- episode_000011.parquet
| `-- ... (38 more entries)
|-- meta
| |-- episodes.jsonl
| |-- episodes_stats.jsonl
| |-- info.json
| `-- tasks.jsonl
`-- videos
`-- chunk-000
|-- observation.images.cam_head_rgb
|-- observation.images.cam_left_wrist_rgb
`-- observation.images.cam_right_wrist_rgb
```
## Camera Views
This dataset includes 3 camera views: `cam_head_rgb`, `cam_left_wrist_rgb`, `cam_right_wrist_rgb`.
## Features (Full YAML)
```yaml
observation.images.cam_head_rgb:
dtype: video
shape:
- 480
- 640
- 3
names:
- height
- width
- channels
info:
video.height: 480
video.width: 640
video.codec: av1
video.pix_fmt: yuv420p
video.is_depth_map: false
video.fps: 30
video.channels: 3
has_audio: false
observation.images.cam_left_wrist_rgb:
dtype: video
shape:
- 480
- 640
- 3
names:
- height
- width
- channels
info:
video.height: 480
video.width: 640
video.codec: av1
video.pix_fmt: yuv420p
video.is_depth_map: false
video.fps: 30
video.channels: 3
has_audio: false
observation.images.cam_right_wrist_rgb:
dtype: video
shape:
- 480
- 640
- 3
names:
- height
- width
- channels
info:
video.height: 480
video.width: 640
video.codec: av1
video.pix_fmt: yuv420p
video.is_depth_map: false
video.fps: 30
video.channels: 3
has_audio: false
observation.state:
dtype: float32
shape:
- 26
names:
- left_arm_joint_1_rad
- left_arm_joint_2_rad
- left_arm_joint_3_rad
- left_arm_joint_4_rad
- left_arm_joint_5_rad
- left_arm_joint_6_rad
- left_gripper_open
- left_eef_pos_x_m
- left_eef_pos_y_m
- left_eef_pos_z_m
- left_eef_rot_euler_x_rad
- left_eef_rot_euler_y_rad
- left_eef_rot_euler_z_rad
- right_arm_joint_1_rad
- right_arm_joint_2_rad
- right_arm_joint_3_rad
- right_arm_joint_4_rad
- right_arm_joint_5_rad
- right_arm_joint_6_rad
- right_gripper_open
- right_eef_pos_x_m
- right_eef_pos_y_m
- right_eef_pos_z_m
- right_eef_rot_euler_x_rad
- right_eef_rot_euler_y_rad
- right_eef_rot_euler_z_rad
action:
dtype: float32
shape:
- 26
names:
- left_arm_joint_1_rad
- left_arm_joint_2_rad
- left_arm_joint_3_rad
- left_arm_joint_4_rad
- left_arm_joint_5_rad
- left_arm_joint_6_rad
- left_gripper_open
- left_eef_pos_x_m
- left_eef_pos_y_m
- left_eef_pos_z_m
- left_eef_rot_euler_x_rad
- left_eef_rot_euler_y_rad
- left_eef_rot_euler_z_rad
- right_arm_joint_1_rad
- right_arm_joint_2_rad
- right_arm_joint_3_rad
- right_arm_joint_4_rad
- right_arm_joint_5_rad
- right_arm_joint_6_rad
- right_gripper_open
- right_eef_pos_x_m
- right_eef_pos_y_m
- right_eef_pos_z_m
- right_eef_rot_euler_x_rad
- right_eef_rot_euler_y_rad
- right_eef_rot_euler_z_rad
timestamp:
dtype: float32
shape:
- 1
names: null
frame_index:
dtype: int64
shape:
- 1
names: null
episode_index:
dtype: int64
shape:
- 1
names: null
index:
dtype: int64
shape:
- 1
names: null
task_index:
dtype: int64
shape:
- 1
names: null
subtask_annotation:
names: null
dtype: int32
shape:
- 5
scene_annotation:
names: null
dtype: int32
shape:
- 1
eef_sim_pose_state:
names:
- left_eef_pos_x
- left_eef_pos_y
- left_eef_pos_z
- left_eef_rot_x
- left_eef_rot_y
- left_eef_rot_z
- right_eef_pos_x
- right_eef_pos_y
- right_eef_pos_z
- right_eef_rot_x
- right_eef_rot_y
- right_eef_rot_z
dtype: float32
shape:
- 12
eef_sim_pose_action:
names:
- left_eef_pos_x
- left_eef_pos_y
- left_eef_pos_z
- left_eef_rot_x
- left_eef_rot_y
- left_eef_rot_z
- right_eef_pos_x
- right_eef_pos_y
- right_eef_pos_z
- right_eef_rot_x
- right_eef_rot_y
- right_eef_rot_z
dtype: float32
shape:
- 12
eef_direction_state:
names:
- left_eef_direction
- right_eef_direction
dtype: int32
shape:
- 2
eef_direction_action:
names:
- left_eef_direction
- right_eef_direction
dtype: int32
shape:
- 2
eef_velocity_state:
names:
- left_eef_velocity
- right_eef_velocity
dtype: int32
shape:
- 2
eef_velocity_action:
names:
- left_eef_velocity
- right_eef_velocity
dtype: int32
shape:
- 2
eef_acc_mag_state:
names:
- left_eef_acc_mag
- right_eef_acc_mag
dtype: int32
shape:
- 2
eef_acc_mag_action:
names:
- left_eef_acc_mag
- right_eef_acc_mag
dtype: int32
shape:
- 2
gripper_mode_state:
names:
- left_gripper_mode
- right_gripper_mode
dtype: int32
shape:
- 2
gripper_mode_action:
names:
- left_gripper_mode
- right_gripper_mode
dtype: int32
shape:
- 2
gripper_activity_state:
names:
- left_gripper_activity
- right_gripper_activity
dtype: int32
shape:
- 2
gripper_activity_action:
names:
- left_gripper_activity
- right_gripper_activity
dtype: int32
shape:
- 2
gripper_open_scale_state:
names:
- left_gripper_open_scale
- right_gripper_open_scale
dtype: float32
shape:
- 2
gripper_open_scale_action:
names:
- left_gripper_open_scale
- right_gripper_open_scale
dtype: float32
shape:
- 2
```
## Available Annotations
This dataset includes rich annotations to support diverse learning approaches:
- `eef_acc_mag_annotation.jsonl`
- `eef_direction_annotation.jsonl`
- `eef_velocity_annotation.jsonl`
- `gripper_activity_annotation.jsonl`
- `gripper_mode_annotation.jsonl`
- `scene_annotations.jsonl`
- `subtask_annotations.jsonl`
## Dataset Tags
- `RoboCOIN`
- `LeRobot`
## Authors
### Contributors
This dataset is contributed by:-RoboCOIN Team at Beijing Academy of Artificial Intelligence (BAAI)
### Annotators
No annotator information available.
## Links
- **Homepage:** [https://flagopen.github.io/RoboCOIN/](https://flagopen.github.io/RoboCOIN/)
- **Paper:** [https://arxiv.org/abs/2511.17441](https://arxiv.org/abs/2511.17441)
- **Repository:** [https://github.com/FlagOpen/RoboCOIN](https://github.com/FlagOpen/RoboCOIN)
## Contact and Support
For questions, issues, or feedback regarding this dataset, please contact us.
### Support
For technical support, please open an issue on our GitHub repository.
## License
apache-2.0
## Citation
If you use this dataset in your research, please cite:
```bibtex
@article{robocoin,
title={RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation},
author={Shihan Wu, Xuecheng Liu, Shaoxuan Xie, Pengwei Wang, Xinghang Li, Bowen Yang, Zhe Li, Kai Zhu, Hongyu Wu, Yiheng Liu, Zhaoye Long, Yue Wang, Chong Liu, Dihan Wang, Ziqiang Ni, Xiang Yang, You Liu, Ruoxuan Feng, Runtian Xu, Lei Zhang, Denghang Huang, Chenghao Jin, Anlan Yin, Xinlong Wang, Zhenguo Sun, Junkai Zhao, Mengfei Du, Mingyu Cao, Xiansheng Chen, Hongyang Cheng, Xiaojie Zhang, Yankai Fu, Ning Chen, Cheng Chi, Sixiang Chen, Huaihai Lyu, Xiaoshuai Hao, Yequan Wang, Bo Lei, Dong Liu, Xi Yang, Yance Jiao, Tengfei Pan, Yunyan Zhang, Songjing Wang, Ziqian Zhang, Xu Liu, Ji Zhang, Caowei Meng, Zhizheng Zhang, Jiyang Gao, Song Wang, Xiaokun Leng, Zhiqiang Xie, Zhenzhen Zhou, Peng Huang, Wu Yang, Yandong Guo, Yichao Zhu, Suibing Zheng, Hao Cheng, Xinmin Ding, Yang Yue, Huanqian Wang, Chi Chen, Jingrui Pang, YuXi Qian, Haoran Geng, Lianli Gao, Haiyuan Li, Bin Fang, Gao Huang, Yaodong Yang, Hao Dong, He Wang, Hang Zhao, Yadong Mu, Di Hu, Hao Zhao, Tiejun Huang, Shanghang Zhang, Yonghua Lin, Zhongyuan Wang and Guocai Yao},
journal={arXiv preprint arXiv:2511.17441},
url = {https://arxiv.org/abs/2511.17441},
year={2025},
}
```
### Additional References
If you use this dataset, please also consider citing:
LeRobot Framework: https://github.com/huggingface/lerobot
## Version Information
Initial Release
提供机构:
RoboCOIN
搜集汇总
数据集介绍

构建方式
在机器人操作领域,高质量的数据集对于推动模仿学习与强化学习算法的进步至关重要。Agilex_Cobot_Magic_fold_short_sleeve_white数据集的构建依托于LeRobot扩展格式,确保了数据的标准化与兼容性。该数据集采集自名为Agilex_Cobot_Magic的双臂机器人系统,其末端执行器为双指夹爪,在模拟的家庭卧室场景中执行折叠白色短袖衬衫并放置于绿色托盘的任务。数据采集过程通过50个完整的情节(episode)完成,总计包含超过七万帧图像,并以30帧每秒的速率记录了来自头部、左手腕和右手腕的三个RGB摄像头的视频流,同时精确同步了机器人的26维状态与动作数据。数据以分块形式组织,存储于Parquet格式文件中,便于高效存取与处理。
特点
该数据集在机器人操作数据集中展现出鲜明的多模态与细粒度特性。其核心特征在于提供了三路高分辨率视觉观测,每路视频分辨率均为640x480,编码为AV1格式,从不同视角完整捕捉了机器人双臂协同操作的动态过程。数据集不仅包含了原始的关节角度、末端执行器位姿等低维状态与动作信息,还附带了丰富的标注层,如子任务分割、场景标注以及末端执行器的速度、加速度、抓取模式等高级语义信息。这些标注将连续的操纵过程分解为抓取、提升、折叠等原子动作序列,为分层强化学习或行为克隆等算法提供了结构化的监督信号。数据集规模适中,总计约701兆字节,在保证数据质量的同时兼顾了使用的便捷性。
使用方法
研究人员可利用该数据集训练和评估各类机器人学习模型。数据集完全兼容LeRobot框架,用户可通过该框架提供的标准接口便捷地加载数据,数据以`data/chunk-{id}/episode_{id}.parquet`的模式存储,视频文件则存放于对应的`videos`目录下。在具体应用中,该数据集适用于端到端的视觉运动策略学习,模型可以三路RGB视频作为输入,预测机器人的26维动作指令。其丰富的子任务与原子动作标注也支持任务规划、技能抽象等研究方向,例如训练任务识别模型或构建分层策略。使用前需同意相关许可协议,并在研究中引用指定的论文。数据已预先划分为训练集,涵盖全部50个情节,可直接用于模型训练。
背景与挑战
背景概述
在机器人操作领域,双手机器人执行精细的日常任务,如衣物折叠,代表了具身智能向复杂环境泛化的重要前沿。Agilex_Cobot_Magic_fold_short_sleeve_white数据集由北京智源人工智能研究院(BAAI)的RoboCOIN团队于2025年构建并发布,作为RoboCOIN项目的一部分,旨在为双手机器人操作提供高质量、多模态的真实世界数据。该数据集聚焦于家庭卧室场景中白色短袖T恤的折叠与放置任务,通过Agilex_Cobot_Magic机器人平台采集,包含50个完整操作片段、超过7万帧数据,并集成了头部及双腕部RGB摄像头等多传感器信息。其核心研究问题在于解决机器人对非刚性物体进行灵巧、顺序性操作时的感知与规划难题,通过遵循LeRobot格式,该数据集有力推动了模仿学习与强化学习算法在真实机器人任务中的训练与评估。
当前挑战
该数据集致力于应对机器人操作中非刚性物体灵巧操控的固有挑战。具体而言,折叠衣物涉及对柔软、易变形物体的精确抓取、提升、折叠及放置,这要求机器人具备对物体状态(如褶皱、形状)的动态感知能力,并生成协调的双臂运动轨迹,其动作空间具有高维度与连续性特征。在数据集构建过程中,挑战同样显著:确保多视角视频数据(头部、左右腕部)在时间上的精确同步是一大难点;采集真实物理交互数据时,需克服机器人控制、传感器校准以及环境光照变化带来的噪声干扰;此外,为长达70709帧的数据提供细粒度的子任务与原子动作标注,如抓取、提升、折叠等,需要耗费大量人力进行准确且一致的注释,以支撑监督学习与技能分解研究。
常用场景
经典使用场景
在机器人操作学习领域,Agilex_Cobot_Magic_fold_short_sleeve_white数据集为双机械臂协同折叠衣物任务提供了标准化的演示数据。该数据集通过记录Agilex_Cobot_Magic机器人在卧室场景中折叠白色短袖T恤并放置于绿色托盘上的完整过程,为模仿学习与强化学习算法提供了多视角视觉观测、机器人状态与动作轨迹的同步序列。其经典应用场景在于训练端到端的策略模型,使机器人能够从人类演示中学习复杂的双手操作技能,特别是处理柔软、可变形物体的精细动作规划。
实际应用
在现实应用中,该数据集所对应的技能可直接迁移至家庭服务机器人领域,例如自动化衣物整理、仓储物流中的柔性物品分拣,以及康复辅助场景中的日常物品操作。基于此类数据训练的模型能够提升机器人在非结构化环境中的适应能力,减少对精确预编程的依赖,推动机器人从工业流水线走向更广泛的日常生活场景,为实现可持续的家庭自动化与个性化辅助服务提供技术支撑。
衍生相关工作
围绕该数据集衍生的经典工作主要集中在RoboCOIN项目生态之内,其遵循LeRobot数据格式,促进了开源机器人数据社区的标准化建设。相关研究利用其多视角视频与状态动作对,开发了基于Transformer的序列预测模型、视觉语言动作规划框架,以及用于技能组合的层次化强化学习方法。这些工作不仅验证了数据集的实用性,也推动了双臂操作、模仿学习与真实世界机器人学习等方向的算法创新与基准测试。
以上内容由遇见数据集搜集并总结生成



