RoboCOIN/Agilex_Cobot_Magic_fold_towel_brown

Name: RoboCOIN/Agilex_Cobot_Magic_fold_towel_brown
Creator: RoboCOIN
Published: 2026-04-02 15:30:57
License: 暂无描述

Hugging Face2026-04-02 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/RoboCOIN/Agilex_Cobot_Magic_fold_towel_brown

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - robotics language: - en extra_gated_prompt: 'By accessing this dataset, you agree to cite the associated paper in your research/publications—see the "Citation" section for details. You agree to not use the dataset to conduct experiments that cause harm to human subjects.' extra_gated_fields: Company/Organization: type: 'text' description: 'e.g., "ETH Zurich", "Boston Dynamics", "Independent Researcher"' Country: type: 'country' description: 'e.g., "Germany", "China", "United States"' tags: - RoboCOIN - LeRobot license: apache-2.0 configs: - config_name: default data_files: data/chunk-{id}/episode_{id}.parquet --- # Agilex_Cobot_Magic_fold_towel_brown ## Dataset Description This dataset uses an extended format based on LeRobot and is fully compatible with LeRobot. ## Task Preview <video src="videos/chunk-000/observation.images.cam_head_rgb/episode_000000.mp4" controls width="640"></video> [View Video Directly](videos/chunk-000/observation.images.cam_head_rgb/episode_000000.mp4) ### Overview - **Total Episodes:** 387 - **Total Frames:** 283464 - **FPS:** 30 - **Dataset Size:** 18.63 GB - **Robot Name:** `Agilex_Cobot_Magic` - **End-Effector Type:** `two_finger_gripper` - **Teleoperation Type:** `Due to some reasons, this dataset temporarily cannot provide the teleoperation type information.` - **Sensors:** `cam_head_rgb`, `cam_right_wrist_rgb`, `cam_left_wrist_rgb` - **Camera Information:** cam_head_rgb; cam_right_wrist_rgb; cam_left_wrist_rgb - **Scene:** `household->bathroom` - **Objects:** `table(unknown)`, `basket(unknown)`, `towel(unknown)` - **Task Description:** fold the towel on the table three times with two grippers. ### Primary Task Instruction > fold the towel on the table three times with two grippers. ### Robot Configuration - **Robot Name:** `Agilex_Cobot_Magic` - **Codebase Version:** `v2.1` - **End-Effector Type:** `two_finger_gripper` - **Teleoperation Type:** `Due to some reasons, this dataset temporarily cannot provide the teleoperation type information.` ## Scene and Objects ### Scene Type `household->bathroom` ### Objects - `table(unknown)` - `basket(unknown)` - `towel(unknown)` ## Task Descriptions - **Standardized Task Description:** `fold the towel on the table three times with two grippers.` - **Operation Type:** `Due to some reasons, this dataset temporarily cannot provide the operation type information.` - **Environment Type:** `Due to some reasons, this dataset temporarily cannot provide the environment type information.` ### Sub-Tasks This dataset includes 11 distinct subtasks: 1. **Left hand: adjust the brown towel** (Index: 0) 2. **Left hand: spread the brown towel flat on the table** (Index: 1) 3. **Right hand: spread the brown towel flat on the table** (Index: 2) 4. **End** (Index: 3) 5. **Left hand: fold the brown towel from left to right** (Index: 4) 6. **Right hand: grab the bottom right corner of brown towel** (Index: 5) 7. **Right hand: adjust the brown towel** (Index: 6) 8. **Left hand: fold the brown towel up** (Index: 7) 9. **Right hand: fold the brown towel up** (Index: 8) 10. **Left hand: grab the bottom left corner of brown towel** (Index: 9) 11. **null** (Index: 10) ### Atomic Actions - `grasp` - `unfold` - `fold` - `pick` - `place` ## Hardware and Sensors ### Sensors - `cam_head_rgb` - `cam_right_wrist_rgb` - `cam_left_wrist_rgb` ### Camera Information - `cam_head_rgb`: dtype=video, shape=480x640x3, resolution=640x480, codec=h264, pix_fmt=yuv420p - `cam_right_wrist_rgb`: dtype=video, shape=480x640x3, resolution=640x480, codec=h264, pix_fmt=yuv420p - `cam_left_wrist_rgb`: dtype=video, shape=480x640x3, resolution=640x480, codec=h264, pix_fmt=yuv420p ### Coordinate System - **Definition:** `right-hand-frame` ### Dimensions & Units - **Joint Rotation:** `radian` - **End-Effector Rotation:** `radian` - **End-Effector Translation:** `meter` ## Dataset Statistics | Metric | Value | |--------|-------| | **Total Episodes** | 387 | | **Total Frames** | 283464 | | **Total Tasks** | 11 | | **Total Videos** | 1161 | | **Total Chunks** | 1 | | **Chunk Size** | 10000 | | **FPS** | 30 | | **State Dimensions** | 26 | | **Action Dimensions** | 26 | | **Camera Views** | 3 | | **Dataset Size** | 18.63 GB | ## Data Splits The dataset is organized into the following splits: - **Training**: Episodes 0:386 - **Validation**: Episodes 334:376 - **Test**: Episodes 376:417 ## Dataset Structure This dataset follows the LeRobot format and contains the following components: ### Data Files - **Videos**: Compressed video files containing RGB camera observations - **State Data**: Robot joint positions, velocities, and other state information - **Action Data**: Robot action commands and trajectories - **Metadata**: Episode metadata, timestamps, and annotations ### File Organization - **Data Path Pattern**: `data/chunk-{id}/episode_{id}.parquet` - **Video Path Pattern**: `videos/chunk-{id}/observation.images.cam_head_rgb/episode_{id}.mp{id}` - **Chunking**: Data is organized into 1 chunk(s) of size 10000 ### Data Structure (Tree) ``` Agilex_Cobot_Magic_fold_towel_brown_qced_hardlink/ |-- annotations | |-- eef_acc_mag_annotation.jsonl | |-- eef_direction_annotation.jsonl | |-- eef_velocity_annotation.jsonl | |-- gripper_activity_annotation.jsonl | |-- gripper_mode_annotation.jsonl | |-- scene_annotations.jsonl | `-- subtask_annotations.jsonl |-- data | `-- chunk-000 | |-- episode_000000.parquet | |-- episode_000001.parquet | |-- episode_000002.parquet | |-- episode_000003.parquet | |-- episode_000004.parquet | |-- episode_000005.parquet | |-- episode_000006.parquet | |-- episode_000007.parquet | |-- episode_000008.parquet | |-- episode_000009.parquet | |-- episode_000010.parquet | `-- episode_000011.parquet | `-- ... (375 more entries) |-- meta | |-- episodes.jsonl | |-- episodes_stats.jsonl | |-- info.json | `-- tasks.jsonl `-- videos `-- chunk-000 |-- observation.images.cam_head_rgb |-- observation.images.cam_left_wrist_rgb `-- observation.images.cam_right_wrist_rgb ``` ## Camera Views This dataset includes 3 camera views: `cam_head_rgb`, `cam_right_wrist_rgb`, `cam_left_wrist_rgb`. ## Features (Full YAML) ```yaml action: dtype: float32 shape: - 26 names: - left_arm_joint_1_rad - left_arm_joint_2_rad - left_arm_joint_3_rad - left_arm_joint_4_rad - left_arm_joint_5_rad - left_arm_joint_6_rad - left_eef_pos_x_m - left_eef_pos_y_m - left_eef_pos_z_m - left_eef_rot_euler_x_rad - left_eef_rot_euler_y_rad - left_eef_rot_euler_z_rad - left_gripper_open - right_arm_joint_1_rad - right_arm_joint_2_rad - right_arm_joint_3_rad - right_arm_joint_4_rad - right_arm_joint_5_rad - right_arm_joint_6_rad - right_eef_pos_x_m - right_eef_pos_y_m - right_eef_pos_z_m - right_eef_rot_euler_x_rad - right_eef_rot_euler_y_rad - right_eef_rot_euler_z_rad - right_gripper_open observation.state: dtype: float32 shape: - 26 names: - left_arm_joint_1_rad - left_arm_joint_2_rad - left_arm_joint_3_rad - left_arm_joint_4_rad - left_arm_joint_5_rad - left_arm_joint_6_rad - left_eef_pos_x_m - left_eef_pos_y_m - left_eef_pos_z_m - left_eef_rot_euler_x_rad - left_eef_rot_euler_y_rad - left_eef_rot_euler_z_rad - left_gripper_open - right_arm_joint_1_rad - right_arm_joint_2_rad - right_arm_joint_3_rad - right_arm_joint_4_rad - right_arm_joint_5_rad - right_arm_joint_6_rad - right_eef_pos_x_m - right_eef_pos_y_m - right_eef_pos_z_m - right_eef_rot_euler_x_rad - right_eef_rot_euler_y_rad - right_eef_rot_euler_z_rad - right_gripper_open observation.images.cam_head_rgb: dtype: video shape: - 480 - 640 - 3 names: - height - width - channels info: video.fps: 30.0 video.height: 480 video.width: 640 video.channels: 3 video.codec: h264 video.pix_fmt: yuv420p video.is_depth_map: false has_audio: false observation.images.cam_right_wrist_rgb: dtype: video shape: - 480 - 640 - 3 names: - height - width - channels info: video.fps: 30.0 video.height: 480 video.width: 640 video.channels: 3 video.codec: h264 video.pix_fmt: yuv420p video.is_depth_map: false has_audio: false observation.images.cam_left_wrist_rgb: dtype: video shape: - 480 - 640 - 3 names: - height - width - channels info: video.fps: 30.0 video.height: 480 video.width: 640 video.channels: 3 video.codec: h264 video.pix_fmt: yuv420p video.is_depth_map: false has_audio: false timestamp: dtype: float32 shape: - 1 names: null frame_index: dtype: int64 shape: - 1 names: null episode_index: dtype: int64 shape: - 1 names: null index: dtype: int64 shape: - 1 names: null task_index: dtype: int64 shape: - 1 names: null subtask_annotation: names: null dtype: int32 shape: - 5 scene_annotation: names: null dtype: int32 shape: - 1 eef_sim_pose_state: names: - left_eef_pos_x - left_eef_pos_y - left_eef_pos_z - left_eef_rot_x - left_eef_rot_y - left_eef_rot_z - right_eef_pos_x - right_eef_pos_y - right_eef_pos_z - right_eef_rot_x - right_eef_rot_y - right_eef_rot_z dtype: float32 shape: - 12 eef_sim_pose_action: names: - left_eef_pos_x - left_eef_pos_y - left_eef_pos_z - left_eef_rot_x - left_eef_rot_y - left_eef_rot_z - right_eef_pos_x - right_eef_pos_y - right_eef_pos_z - right_eef_rot_x - right_eef_rot_y - right_eef_rot_z dtype: float32 shape: - 12 eef_direction_state: names: - left_eef_direction - right_eef_direction dtype: int32 shape: - 2 eef_direction_action: names: - left_eef_direction - right_eef_direction dtype: int32 shape: - 2 eef_velocity_state: names: - left_eef_velocity - right_eef_velocity dtype: int32 shape: - 2 eef_velocity_action: names: - left_eef_velocity - right_eef_velocity dtype: int32 shape: - 2 eef_acc_mag_state: names: - left_eef_acc_mag - right_eef_acc_mag dtype: int32 shape: - 2 eef_acc_mag_action: names: - left_eef_acc_mag - right_eef_acc_mag dtype: int32 shape: - 2 gripper_mode_state: names: - left_gripper_mode - right_gripper_mode dtype: int32 shape: - 2 gripper_mode_action: names: - left_gripper_mode - right_gripper_mode dtype: int32 shape: - 2 gripper_activity_state: names: - left_gripper_activity - right_gripper_activity dtype: int32 shape: - 2 gripper_activity_action: names: - left_gripper_activity - right_gripper_activity dtype: int32 shape: - 2 gripper_open_scale_state: names: - left_gripper_open_scale - right_gripper_open_scale dtype: float32 shape: - 2 gripper_open_scale_action: names: - left_gripper_open_scale - right_gripper_open_scale dtype: float32 shape: - 2 ``` ## Available Annotations This dataset includes rich annotations to support diverse learning approaches: - `eef_acc_mag_annotation.jsonl` - `eef_direction_annotation.jsonl` - `eef_velocity_annotation.jsonl` - `gripper_activity_annotation.jsonl` - `gripper_mode_annotation.jsonl` - `scene_annotations.jsonl` - `subtask_annotations.jsonl` ## Dataset Tags - `RoboCOIN` - `LeRobot` ## Authors ### Contributors This dataset is contributed by:-RoboCOIN Team at Beijing Academy of Artificial Intelligence (BAAI) ### Annotators No annotator information available. ## Links - **Homepage:** [https://flagopen.github.io/RoboCOIN/](https://flagopen.github.io/RoboCOIN/) - **Paper:** [https://arxiv.org/abs/2511.17441](https://arxiv.org/abs/2511.17441) - **Repository:** [https://github.com/FlagOpen/RoboCOIN](https://github.com/FlagOpen/RoboCOIN) ## Contact and Support For questions, issues, or feedback regarding this dataset, please contact us. ### Support For technical support, please open an issue on our GitHub repository. ## License apache-2.0 ## Citation If you use this dataset in your research, please cite: ```bibtex @article{robocoin, title={RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation}, author={Shihan Wu, Xuecheng Liu, Shaoxuan Xie, Pengwei Wang, Xinghang Li, Bowen Yang, Zhe Li, Kai Zhu, Hongyu Wu, Yiheng Liu, Zhaoye Long, Yue Wang, Chong Liu, Dihan Wang, Ziqiang Ni, Xiang Yang, You Liu, Ruoxuan Feng, Runtian Xu, Lei Zhang, Denghang Huang, Chenghao Jin, Anlan Yin, Xinlong Wang, Zhenguo Sun, Junkai Zhao, Mengfei Du, Mingyu Cao, Xiansheng Chen, Hongyang Cheng, Xiaojie Zhang, Yankai Fu, Ning Chen, Cheng Chi, Sixiang Chen, Huaihai Lyu, Xiaoshuai Hao, Yequan Wang, Bo Lei, Dong Liu, Xi Yang, Yance Jiao, Tengfei Pan, Yunyan Zhang, Songjing Wang, Ziqian Zhang, Xu Liu, Ji Zhang, Caowei Meng, Zhizheng Zhang, Jiyang Gao, Song Wang, Xiaokun Leng, Zhiqiang Xie, Zhenzhen Zhou, Peng Huang, Wu Yang, Yandong Guo, Yichao Zhu, Suibing Zheng, Hao Cheng, Xinmin Ding, Yang Yue, Huanqian Wang, Chi Chen, Jingrui Pang, YuXi Qian, Haoran Geng, Lianli Gao, Haiyuan Li, Bin Fang, Gao Huang, Yaodong Yang, Hao Dong, He Wang, Hang Zhao, Yadong Mu, Di Hu, Hao Zhao, Tiejun Huang, Shanghang Zhang, Yonghua Lin, Zhongyuan Wang and Guocai Yao}, journal={arXiv preprint arXiv:2511.17441}, url = {https://arxiv.org/abs/2511.17441}, year={2025}, } ``` ### Additional References If you use this dataset, please also consider citing: LeRobot Framework: https://github.com/huggingface/lerobot ## Version Information Initial Release

提供机构：

RoboCOIN

搜集汇总

数据集介绍

构建方式

在机器人操作领域，高质量的数据集对于推动灵巧操控算法的进步至关重要。Agilex_Cobot_Magic_fold_towel_brown数据集的构建依托于扩展的LeRobot格式，确保了与现有机器人学习框架的完全兼容性。该数据集通过Agilex_Cobot_Magic双机械臂机器人系统，在模拟家庭浴室场景中执行折叠毛巾的复杂任务，系统性地采集了387个完整操作片段。数据采集过程整合了头部及双腕部共计三个RGB摄像头的多视角视觉流，并以30帧每秒的速率同步记录机器人26维的状态与动作信息，最终形成规模达18.63GB的结构化数据集合。

特点

该数据集的核心特征在于其针对双机械臂协同操作的精细刻画与多模态信息融合。数据集详尽记录了机器人左右机械臂各六个关节的角度、末端执行器的三维位姿与旋转，以及夹爪的开合状态，构成了高维度的连续状态与动作空间。尤为突出的是，其提供了来自头部、右手腕和左手腕的三个同步RGB视频流，为基于视觉的模仿学习与策略学习提供了丰富的环境感知信息。此外，数据集附带了包括末端执行器运动方向、速度、加速度以及夹爪活动模式在内的多层次语义标注，并清晰分解了折叠毛巾任务所涵盖的11个子步骤，为分层强化学习等前沿方法提供了坚实基础。

使用方法

为有效利用该数据集进行机器人学习研究，研究者可依据其标准化的LeRobot格式进行数据加载与处理。数据集已预先划分为训练、验证与测试集，对应的片段索引明确，便于模型训练与评估。用户可通过解析`data/chunk-{id}/`路径下的Parquet文件获取状态、动作及时间戳等核心数据，并配合`videos/`目录下的多视角视频文件进行视觉特征提取。丰富的注解文件，如子任务标注与场景标注，支持监督学习、行为克隆以及基于目标的策略学习等多种范式。在具体应用中，建议结合LeRobot框架提供的工具链进行数据预处理、模型训练与仿真验证，以充分发挥该数据集在推动双机械臂灵巧操作研究方面的价值。

背景与挑战

背景概述

在机器人操作领域，实现灵巧的双臂协同以完成复杂的日常任务是推动具身智能发展的核心挑战。Agilex_Cobot_Magic_fold_towel_brown数据集应运而生，由北京智源人工智能研究院（BAAI）的RoboCOIN团队于2025年贡献发布。该数据集聚焦于家庭环境中的布料折叠任务，旨在为机器人学习提供高质量、多模态的演示数据。其核心研究问题在于如何让机器人通过观察人类演示，理解并执行涉及非刚性物体形变与精确空间操作的序列任务。该数据集基于LeRobot框架构建，包含387个完整演示片段，总计超过28万帧的多视角视觉与状态数据，为模仿学习、强化学习以及策略泛化研究提供了宝贵的基准资源，显著推动了家庭服务机器人操作技能的数据驱动研究进程。

当前挑战

该数据集致力于解决机器人操作中非刚性物体形变控制的根本性难题，其核心挑战在于对毛巾这类柔软、易变形物体进行精确、可重复的折叠操作。这要求算法不仅需理解高维的视觉与状态空间，还需建模物体与环境的动态物理交互。在构建过程中，挑战同样显著：高质量的双臂遥操作数据采集成本高昂，且需确保多传感器（三个RGB摄像头）数据的时空严格同步与标定。此外，将复杂的连续操作分解为11个清晰定义的子任务并进行细致标注，需要大量的人工努力与领域知识，以构建可用于监督学习或技能分割的结构化数据。数据规模的限制（387个片段）也对学习模型的样本效率与泛化能力提出了严峻考验。

常用场景

经典使用场景

在机器人操作学习领域，Agilex_Cobot_Magic_fold_towel_brown数据集为双机械臂协同操作提供了典型范例。该数据集聚焦于家庭环境中的精细操作任务，即使用双指夹爪将毛巾在桌面上折叠三次。通过包含387个完整交互片段，以及来自头部和双腕部摄像头的多视角视频数据，它系统记录了机械臂在复杂物体形变过程中的状态与动作序列。这一场景常被用于训练和评估模仿学习、强化学习等算法在双臂协同操作中的表现，尤其关注对非刚性物体的灵巧处理能力。

解决学术问题

该数据集有效应对了机器人学中双机械臂协同操作非刚性物体的核心挑战。传统研究多集中于刚性物体的抓取与放置，而对毛巾这类可变形物体的操作涉及复杂的动力学与几何变化。数据集通过详尽的子任务标注，如展开、抓取角落、折叠等原子动作，为理解操作过程中的状态-动作映射提供了结构化数据。它助力解决动作分割、任务规划以及多模态感知融合等学术问题，推动了机器人从单臂简单操作向双臂复杂协作的范式转移。

衍生相关工作

基于此数据集及其所属的RoboCOIN项目，已衍生出多项聚焦于双机械臂操作与模仿学习的经典研究工作。这些工作通常利用其丰富的多模态数据，探索端到端的策略学习、基于视觉的动作预测以及分层任务规划。例如，相关研究可能借鉴其子任务标注结构，开发能够理解并执行复杂操作序列的模型。同时，遵循LeRobot格式的数据集设计也促进了与社区内其他机器人数据集的兼容与联合使用，催生了更广泛、更通用的机器人操作基准测试与算法比较。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集