five

RoboCOIN/Agilex_Cobot_Magic_storage_towel

收藏
Hugging Face2026-04-02 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/RoboCOIN/Agilex_Cobot_Magic_storage_towel
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - robotics language: - en extra_gated_prompt: 'By accessing this dataset, you agree to cite the associated paper in your research/publications—see the "Citation" section for details. You agree to not use the dataset to conduct experiments that cause harm to human subjects.' extra_gated_fields: Company/Organization: type: 'text' description: 'e.g., "ETH Zurich", "Boston Dynamics", "Independent Researcher"' Country: type: 'country' description: 'e.g., "Germany", "China", "United States"' tags: - RoboCOIN - LeRobot license: apache-2.0 configs: - config_name: default data_files: data/chunk-{id}/episode_{id}.parquet --- # Agilex_Cobot_Magic_storage_towel ## Dataset Description This dataset uses an extended format based on LeRobot and is fully compatible with LeRobot. ## Task Preview <video src="videos/chunk-000/observation.images.cam_head_rgb/episode_000000.mp4" controls width="640"></video> [View Video Directly](videos/chunk-000/observation.images.cam_head_rgb/episode_000000.mp4) ### Overview - **Total Episodes:** 1176 - **Total Frames:** 512832 - **FPS:** 30 - **Dataset Size:** 34.54 GB - **Robot Name:** `Agilex_Cobot_Magic` - **End-Effector Type:** `two_finger_gripper` - **Teleoperation Type:** `Due to some reasons, this dataset temporarily cannot provide the teleoperation type information.` - **Sensors:** `cam_head_rgb`, `cam_right_wrist_rgb`, `cam_left_wrist_rgb` - **Camera Information:** cam_head_rgb; cam_right_wrist_rgb; cam_left_wrist_rgb - **Scene:** `office_workspace->office` - **Objects:** `table(unknown)`, `basket(unknown)`, `towel(unknown)` - **Task Description:** pick up the folded towels and put them in the basket. ### Primary Task Instruction > pick up the folded towels and put them in the basket. ### Robot Configuration - **Robot Name:** `Agilex_Cobot_Magic` - **Codebase Version:** `v2.1` - **End-Effector Type:** `two_finger_gripper` - **Teleoperation Type:** `Due to some reasons, this dataset temporarily cannot provide the teleoperation type information.` ## Scene and Objects ### Scene Type `office_workspace->office` ### Objects - `table(unknown)` - `basket(unknown)` - `towel(unknown)` ## Task Descriptions - **Standardized Task Description:** `pick up the folded towels and put them in the basket.` - **Operation Type:** `Due to some reasons, this dataset temporarily cannot provide the operation type information.` - **Environment Type:** `Due to some reasons, this dataset temporarily cannot provide the environment type information.` ### Sub-Tasks This dataset includes 10 distinct subtasks: 1. **Right hand: grab the folded purple towel** (Index: 0) 2. **Right hand: grab the folded brown towel** (Index: 1) 3. **Right hand: grab the folded grey towel** (Index: 2) 4. **Right hand: place the blue towel in right basket** (Index: 3) 5. **End** (Index: 4) 6. **Right hand: place the grey towel in right basket** (Index: 5) 7. **Right hand: place the purple towel in right basket** (Index: 6) 8. **Right hand: place the brown towel in right basket** (Index: 7) 9. **Right hand: grab the folded blue towel** (Index: 8) 10. **null** (Index: 9) ### Atomic Actions - `grasp` - `fold` - `lift` - `lower` ## Hardware and Sensors ### Sensors - `cam_head_rgb` - `cam_right_wrist_rgb` - `cam_left_wrist_rgb` ### Camera Information - `cam_head_rgb`: dtype=video, shape=480x640x3, resolution=640x480, codec=h264, pix_fmt=yuv420p - `cam_right_wrist_rgb`: dtype=video, shape=480x640x3, resolution=640x480, codec=h264, pix_fmt=yuv420p - `cam_left_wrist_rgb`: dtype=video, shape=480x640x3, resolution=640x480, codec=h264, pix_fmt=yuv420p ### Coordinate System - **Definition:** `right-hand-frame` ### Dimensions & Units - **Joint Rotation:** `radian` - **End-Effector Rotation:** `radian` - **End-Effector Translation:** `meter` ## Dataset Statistics | Metric | Value | |--------|-------| | **Total Episodes** | 1176 | | **Total Frames** | 512832 | | **Total Tasks** | 10 | | **Total Videos** | 3528 | | **Total Chunks** | 1 | | **Chunk Size** | 10000 | | **FPS** | 30 | | **State Dimensions** | 26 | | **Action Dimensions** | 26 | | **Camera Views** | 3 | | **Dataset Size** | 34.54 GB | ## Data Splits The dataset is organized into the following splits: - **Training**: Episodes 0:1175 - **Validation**: Episodes 960:1080 - **Test**: Episodes 1080:1201 ## Dataset Structure This dataset follows the LeRobot format and contains the following components: ### Data Files - **Videos**: Compressed video files containing RGB camera observations - **State Data**: Robot joint positions, velocities, and other state information - **Action Data**: Robot action commands and trajectories - **Metadata**: Episode metadata, timestamps, and annotations ### File Organization - **Data Path Pattern**: `data/chunk-{id}/episode_{id}.parquet` - **Video Path Pattern**: `videos/chunk-{id}/observation.images.cam_head_rgb/episode_{id}.mp{id}` - **Chunking**: Data is organized into 1 chunk(s) of size 10000 ### Data Structure (Tree) ``` Agilex_Cobot_Magic_storage_towel_qced_hardlink/ |-- annotations | |-- eef_acc_mag_annotation.jsonl | |-- eef_direction_annotation.jsonl | |-- eef_velocity_annotation.jsonl | |-- gripper_activity_annotation.jsonl | |-- gripper_mode_annotation.jsonl | |-- scene_annotations.jsonl | `-- subtask_annotations.jsonl |-- data | `-- chunk-000 | |-- episode_000000.parquet | |-- episode_000001.parquet | |-- episode_000002.parquet | |-- episode_000003.parquet | |-- episode_000004.parquet | |-- episode_000005.parquet | |-- episode_000006.parquet | |-- episode_000007.parquet | |-- episode_000008.parquet | |-- episode_000009.parquet | |-- episode_000010.parquet | `-- episode_000011.parquet | `-- ... (1164 more entries) |-- meta | |-- episodes.jsonl | |-- episodes_stats.jsonl | |-- info.json | `-- tasks.jsonl `-- videos `-- chunk-000 |-- observation.images.cam_head_rgb |-- observation.images.cam_left_wrist_rgb `-- observation.images.cam_right_wrist_rgb ``` ## Camera Views This dataset includes 3 camera views: `cam_head_rgb`, `cam_right_wrist_rgb`, `cam_left_wrist_rgb`. ## Features (Full YAML) ```yaml action: dtype: float32 shape: - 26 names: - left_arm_joint_1_rad - left_arm_joint_2_rad - left_arm_joint_3_rad - left_arm_joint_4_rad - left_arm_joint_5_rad - left_arm_joint_6_rad - left_eef_pos_x_m - left_eef_pos_y_m - left_eef_pos_z_m - left_eef_rot_euler_x_rad - left_eef_rot_euler_y_rad - left_eef_rot_euler_z_rad - left_gripper_open - right_arm_joint_1_rad - right_arm_joint_2_rad - right_arm_joint_3_rad - right_arm_joint_4_rad - right_arm_joint_5_rad - right_arm_joint_6_rad - right_eef_pos_x_m - right_eef_pos_y_m - right_eef_pos_z_m - right_eef_rot_euler_x_rad - right_eef_rot_euler_y_rad - right_eef_rot_euler_z_rad - right_gripper_open observation.state: dtype: float32 shape: - 26 names: - left_arm_joint_1_rad - left_arm_joint_2_rad - left_arm_joint_3_rad - left_arm_joint_4_rad - left_arm_joint_5_rad - left_arm_joint_6_rad - left_eef_pos_x_m - left_eef_pos_y_m - left_eef_pos_z_m - left_eef_rot_euler_x_rad - left_eef_rot_euler_y_rad - left_eef_rot_euler_z_rad - left_gripper_open - right_arm_joint_1_rad - right_arm_joint_2_rad - right_arm_joint_3_rad - right_arm_joint_4_rad - right_arm_joint_5_rad - right_arm_joint_6_rad - right_eef_pos_x_m - right_eef_pos_y_m - right_eef_pos_z_m - right_eef_rot_euler_x_rad - right_eef_rot_euler_y_rad - right_eef_rot_euler_z_rad - right_gripper_open observation.images.cam_head_rgb: dtype: video shape: - 480 - 640 - 3 names: - height - width - channels info: video.fps: 30.0 video.height: 480 video.width: 640 video.channels: 3 video.codec: h264 video.pix_fmt: yuv420p video.is_depth_map: false has_audio: false observation.images.cam_right_wrist_rgb: dtype: video shape: - 480 - 640 - 3 names: - height - width - channels info: video.fps: 30.0 video.height: 480 video.width: 640 video.channels: 3 video.codec: h264 video.pix_fmt: yuv420p video.is_depth_map: false has_audio: false observation.images.cam_left_wrist_rgb: dtype: video shape: - 480 - 640 - 3 names: - height - width - channels info: video.fps: 30.0 video.height: 480 video.width: 640 video.channels: 3 video.codec: h264 video.pix_fmt: yuv420p video.is_depth_map: false has_audio: false timestamp: dtype: float32 shape: - 1 names: null frame_index: dtype: int64 shape: - 1 names: null episode_index: dtype: int64 shape: - 1 names: null index: dtype: int64 shape: - 1 names: null task_index: dtype: int64 shape: - 1 names: null subtask_annotation: names: null dtype: int32 shape: - 5 scene_annotation: names: null dtype: int32 shape: - 1 eef_sim_pose_state: names: - left_eef_pos_x - left_eef_pos_y - left_eef_pos_z - left_eef_rot_x - left_eef_rot_y - left_eef_rot_z - right_eef_pos_x - right_eef_pos_y - right_eef_pos_z - right_eef_rot_x - right_eef_rot_y - right_eef_rot_z dtype: float32 shape: - 12 eef_sim_pose_action: names: - left_eef_pos_x - left_eef_pos_y - left_eef_pos_z - left_eef_rot_x - left_eef_rot_y - left_eef_rot_z - right_eef_pos_x - right_eef_pos_y - right_eef_pos_z - right_eef_rot_x - right_eef_rot_y - right_eef_rot_z dtype: float32 shape: - 12 eef_direction_state: names: - left_eef_direction - right_eef_direction dtype: int32 shape: - 2 eef_direction_action: names: - left_eef_direction - right_eef_direction dtype: int32 shape: - 2 eef_velocity_state: names: - left_eef_velocity - right_eef_velocity dtype: int32 shape: - 2 eef_velocity_action: names: - left_eef_velocity - right_eef_velocity dtype: int32 shape: - 2 eef_acc_mag_state: names: - left_eef_acc_mag - right_eef_acc_mag dtype: int32 shape: - 2 eef_acc_mag_action: names: - left_eef_acc_mag - right_eef_acc_mag dtype: int32 shape: - 2 gripper_mode_state: names: - left_gripper_mode - right_gripper_mode dtype: int32 shape: - 2 gripper_mode_action: names: - left_gripper_mode - right_gripper_mode dtype: int32 shape: - 2 gripper_activity_state: names: - left_gripper_activity - right_gripper_activity dtype: int32 shape: - 2 gripper_activity_action: names: - left_gripper_activity - right_gripper_activity dtype: int32 shape: - 2 gripper_open_scale_state: names: - left_gripper_open_scale - right_gripper_open_scale dtype: float32 shape: - 2 gripper_open_scale_action: names: - left_gripper_open_scale - right_gripper_open_scale dtype: float32 shape: - 2 ``` ## Available Annotations This dataset includes rich annotations to support diverse learning approaches: - `eef_acc_mag_annotation.jsonl` - `eef_direction_annotation.jsonl` - `eef_velocity_annotation.jsonl` - `gripper_activity_annotation.jsonl` - `gripper_mode_annotation.jsonl` - `scene_annotations.jsonl` - `subtask_annotations.jsonl` ## Dataset Tags - `RoboCOIN` - `LeRobot` ## Authors ### Contributors This dataset is contributed by:-RoboCOIN Team at Beijing Academy of Artificial Intelligence (BAAI) ### Annotators No annotator information available. ## Links - **Homepage:** [https://flagopen.github.io/RoboCOIN/](https://flagopen.github.io/RoboCOIN/) - **Paper:** [https://arxiv.org/abs/2511.17441](https://arxiv.org/abs/2511.17441) - **Repository:** [https://github.com/FlagOpen/RoboCOIN](https://github.com/FlagOpen/RoboCOIN) ## Contact and Support For questions, issues, or feedback regarding this dataset, please contact us. ### Support For technical support, please open an issue on our GitHub repository. ## License apache-2.0 ## Citation If you use this dataset in your research, please cite: ```bibtex @article{robocoin, title={RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation}, author={Shihan Wu, Xuecheng Liu, Shaoxuan Xie, Pengwei Wang, Xinghang Li, Bowen Yang, Zhe Li, Kai Zhu, Hongyu Wu, Yiheng Liu, Zhaoye Long, Yue Wang, Chong Liu, Dihan Wang, Ziqiang Ni, Xiang Yang, You Liu, Ruoxuan Feng, Runtian Xu, Lei Zhang, Denghang Huang, Chenghao Jin, Anlan Yin, Xinlong Wang, Zhenguo Sun, Junkai Zhao, Mengfei Du, Mingyu Cao, Xiansheng Chen, Hongyang Cheng, Xiaojie Zhang, Yankai Fu, Ning Chen, Cheng Chi, Sixiang Chen, Huaihai Lyu, Xiaoshuai Hao, Yequan Wang, Bo Lei, Dong Liu, Xi Yang, Yance Jiao, Tengfei Pan, Yunyan Zhang, Songjing Wang, Ziqian Zhang, Xu Liu, Ji Zhang, Caowei Meng, Zhizheng Zhang, Jiyang Gao, Song Wang, Xiaokun Leng, Zhiqiang Xie, Zhenzhen Zhou, Peng Huang, Wu Yang, Yandong Guo, Yichao Zhu, Suibing Zheng, Hao Cheng, Xinmin Ding, Yang Yue, Huanqian Wang, Chi Chen, Jingrui Pang, YuXi Qian, Haoran Geng, Lianli Gao, Haiyuan Li, Bin Fang, Gao Huang, Yaodong Yang, Hao Dong, He Wang, Hang Zhao, Yadong Mu, Di Hu, Hao Zhao, Tiejun Huang, Shanghang Zhang, Yonghua Lin, Zhongyuan Wang and Guocai Yao}, journal={arXiv preprint arXiv:2511.17441}, url = {https://arxiv.org/abs/2511.17441}, year={2025}, } ``` ### Additional References If you use this dataset, please also consider citing: LeRobot Framework: https://github.com/huggingface/lerobot ## Version Information Initial Release
提供机构:
RoboCOIN
搜集汇总
数据集介绍
main_image_url
构建方式
在机器人操作领域,高质量的数据集对于推动模仿学习与强化学习算法的发展至关重要。Agilex_Cobot_Magic_storage_towel数据集基于LeRobot框架的扩展格式构建,确保了与现有机器人学习生态系统的完全兼容性。其构建过程采集了Agilex_Cobot_Magic双手机器人在办公场景中执行毛巾收纳任务的演示数据,通过多视角视觉传感器同步记录操作过程,并以结构化的Parquet文件格式组织存储,涵盖了状态、动作、视频流及丰富的标注信息。
特点
该数据集在机器人操作数据领域展现出鲜明的特色。其核心在于收录了1176个完整操作片段,总计超过51万帧图像数据,并以30帧每秒的速率捕捉了机器人抓取与放置折叠毛巾的精细动作。数据集提供了三个同步的RGB摄像头视角,包括头部及左右腕部视角,为算法提供了丰富的视觉上下文。尤为突出的是,数据集蕴含了26维的状态与动作空间描述,并附带了从末端执行器运动到抓取器活动模式的多层次语义标注,为复杂策略的端到端学习奠定了坚实基础。
使用方法
为便于研究者高效利用,数据集已预先划分为训练、验证与测试集,并遵循清晰的文件目录结构。用户可通过加载指定路径的Parquet文件直接访问状态、动作及时间戳序列,而多路视频流则独立存储于相应目录。该数据集与LeRobot框架原生兼容,研究者可借助该框架的数据加载器便捷地构建数据管道,用于训练行为克隆、离线强化学习等模型。丰富的标注文件进一步支持对特定子任务或机器人运动模式进行有针对性的分析与建模。
背景与挑战
背景概述
在机器人操作学习领域,高质量、大规模的真实世界数据集是推动模仿学习与强化学习算法发展的关键基石。Agilex_Cobot_Magic_storage_towel数据集由北京智源人工智能研究院(BAAI)的RoboCOIN团队于2025年构建并发布,隶属于RoboCOIN项目这一开放双手机器人数据收集计划。该数据集聚焦于解决机器人灵巧操作中的具体任务——在办公场景下使用双指夹爪拾取折叠毛巾并放入篮子,其核心研究问题在于如何通过丰富的多视角感知数据与精细的动作标注,训练机器人执行需要精确抓取与放置的日常整理任务。该数据集基于LeRobot格式构建,包含超过一千个交互片段与五十余万帧数据,为机器人操作策略的端到端学习提供了宝贵的真实世界交互轨迹,对推动家庭服务机器人等领域的应用研究具有显著影响力。
当前挑战
该数据集旨在解决的领域挑战是机器人灵巧操作中的视觉运动策略学习问题,具体涉及在非结构化办公环境中对可变形物体(毛巾)进行可靠抓取与精准放置。这一任务要求算法能够从多视角视觉输入中理解物体姿态、材质特性,并生成高维、连续的关节与末端执行器控制指令,其难点在于处理动作的长期依赖性、接触物理的建模以及从演示数据中泛化到新情境。在数据集构建过程中,挑战同样显著:需要设计安全可靠的遥操作系统以收集大量人类演示数据;确保多路高清视频流与高频率机器人状态数据的精确同步与对齐;对复杂的双臂协同动作进行细粒度的子任务与原子动作标注;以及处理大规模数据(超过34GB)所带来的存储、管理与高效访问的技术难题。
常用场景
经典使用场景
在机器人操作学习领域,Agilex_Cobot_Magic_storage_towel数据集为双臂协作机器人的灵巧操作任务提供了标准化的研究平台。该数据集聚焦于办公室环境中折叠毛巾的拾取与放置任务,通过多视角视频流与高维状态动作序列,为模仿学习与强化学习算法提供了丰富的训练样本。其经典应用场景在于训练机器人执行精细的物体抓取与定位操作,尤其适用于研究双臂协调与视觉伺服控制策略在非结构化环境中的泛化能力。
解决学术问题
该数据集有效应对了机器人操作研究中数据稀缺与任务多样性的挑战。通过提供大规模、高质量的双臂操作演示数据,它支持端到端策略学习、多模态感知融合以及分层任务规划等前沿课题的探索。其精细的关节状态、末端执行器轨迹及多相机视角标注,为研究动作表示学习、状态估计误差补偿以及跨场景技能迁移提供了关键实验基础,显著推动了具身智能在真实物理交互中的算法进步。
衍生相关工作
围绕该数据集衍生的经典研究工作主要集中在机器人模仿学习框架的优化与多任务策略泛化方面。例如,基于LeRobot格式的扩展研究提出了新的时空注意力机制,以处理多视角视觉输入与动作序列的对应关系;另有工作利用其细粒度动作标注开发了分层强化学习模型,实现了从原子操作到复合任务的技能组合。这些成果进一步丰富了RoboCOIN生态,为开源机器人社区提供了可复现的基准与算法改进方向。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作