RoboMIND

Name: RoboMIND
Creator: maas
Published: 2026-05-17 01:16:52
License: 暂无描述

魔搭社区2026-05-17 更新2025-09-20 收录

下载链接：

https://modelscope.cn/datasets/X-Humanoid/RoboMIND

下载链接

链接失效反馈

官方服务：

资源简介：

# [RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation](https://x-humanoid-robomind.github.io/) [![License](https://img.shields.io/badge/License-Apache_2.0-yellow.svg)](https://opensource.org/licenses/Apache-2.0) [![Project Page](https://img.shields.io/badge/Project%20Page-RoboMIND-blue.svg)](https://x-humanoid-robomind.github.io/) [![Dataset](https://img.shields.io/badge/Dataset-flopsera-000000.svg)](https://www.beaicloud.com/datasets/datasetDetail?path=%2Fdata-detail%2F21181956226031626&type=open) [![Hugging Face](https://img.shields.io/badge/Hugging_Face-RoboMIND-000000.svg)](https://huggingface.co/datasets/x-humanoid-robomind/RoboMIND) [![arXiv](https://img.shields.io/badge/arXiv-2412.13877-red.svg?style=flat-square)](https://arxiv.org/abs/2412.13877) Accepted by [Robotics: Science and Systems (RSS) 2025](https://roboticsconference.org/program/papers/152/). ## 💾 Overview of RoboMIND 💾 <img src="./static/images/piechart_new.png" border=0 width=100%> ### 🤖 Composition of RoboMIND 🤖 We present RoboMIND (Multi-embodiment Intelligence Normative Dataset and Benchmark for Robot Manipulation), a comprehensive dataset featuring 107k real-world demonstration trajectories spanning 479 distinct tasks and involving 96 unique object classes. The RoboMIND dataset integrates teleoperation data from multiple robotic embodiments, comprising 52,926 trajectories from the Franka Emika Panda single-arm robot, 19,152 trajectories from the Tien Kung humanoid robot, 10,629 trajectories from the AgileX Cobot Magic V2.0 dual-arm robot, and 25,170 trajectories from the UR-5e single-arm robot. RoboMIND provides researchers and developers with an invaluable resource for advancing robotic learning and automation technologies by encompassing a broad spectrum of task types and diverse object categories. This dataset stands out for its substantial scale and exceptional quality, ensuring its effectiveness and reliability in practical applications. ### 🔎 Distribution of Trajectory Lengths 🔎 Different robotic embodiments exhibit distinct trajectory length distributions. Franka and UR robots typically feature shorter trajectories with fewer than 200 timesteps, making them ideal for training fundamental manipulation skills. In contrast, Tien Kung and AgileX robots generally demonstrate longer trajectories exceeding 500 timesteps, which makes them better suited for training long-horizon tasks and complex skill combinations. ### 🚀 Task Categories 🚀 Based on natural language descriptions and considering factors such as object size, usage scenarios, and operational skills, we classify the dataset tasks into six major categories: 1) Articulated Manipulations (Artic. M.). 2) Coordination Manipulations (Coord. M.). 3) Basic Manipulations (Basic M.). 4) Multiple Object Interactions (Obj. Int.). 5) Precision Manipulations (Precision M.). 6) Scene Understanding (Scene U.) Beyond basic manipulations, the dataset includes numerous complex tasks, providing rich data support for training generalized robotic policies. ### 💪 Diversity of Objects 💪 The dataset encompasses 96 distinct object categories. In kitchen scenarios, it includes common foods like strawberries, eggs, bananas, and pears, as well as complex adjustable appliances such as ovens and bread makers. In domestic settings, the dataset features both rigid objects like tennis balls and deformable objects like toys. Office and industrial scenarios include small objects requiring precise control, such as batteries and gears. This diverse object range enhances dataset complexity and supports training versatile manipulation policies applicable across various environments. <img src="./static/images/Distribution_new.png" border=5 width=95%> ## 📁 Data Description 📁 Building high-quality robotic training datasets is crucial for developing end-to-end embodied AI models with strong generalization capabilities. An ideal dataset should cover diverse scenarios, task types, and robotic embodiments, enabling models to adapt to different environments and reliably execute various tasks. Our team has constructed a large-scale, real-world robotic learning dataset that records interaction data during long-horizon task execution in complex environments, supporting the training of models with general manipulation capabilities. Below is a partial directory structure example showing two training trajectories and two validation trajectories for a single task using the Franka robot: ``` . |-- h5_agilex_3rgb |-- h5_franka_1rgb | |-- bread_in_basket | | `-- success_episodes | | |-- train | | | |-- 1014_144602 | | | | `-- data | | | | `-- trajectory.hdf5 | | | |-- 1014_144755 | | | | `-- data | | | | `-- trajectory.hdf5 | | |-- val | | | |-- 1014_144642 | | | | `-- data | | | | `-- trajectory.hdf5 | | | |-- 1014_151731 | | | | `-- data | | | | `-- trajectory.hdf5 |-- h5_franka_3rgb |-- h5_simulation |-- h5_tienkung_gello_1rgb |-- h5_tienkung_xsens_1rgb |-- h5_ur_1rgb ``` ## 🗃️ HDF5 File Format 🗃️ Please refer to [all_robot_h5_info.md](./static/all_robot_h5_info.md). Due to equipment maintenance, 675 trajectories in the h5_franka_3rgb folder only contain image data from the left and right cameras. For the specific data paths, please refer to [franka_3rgb_2cam_paths.md](./static/franka_3rgb_2cam_paths.md). In the simulation data, the acquisition frequency of the camera and the robotic arm is approximately 1:4. Additionally, the depth image is not available temporarily. ## 🧰 Task Language Instructions 🧰 We have provided corresponding language instructions for each task [RoboMIND_instr.csv](./static/RoboMIND_v1_2_instr.csv)。 ## 📊 Example of Data Usage 📊 Please refer to [Quick_Start.ipynb](./quick_start.ipynb). Please note: 1. For h5_franka_3rgb, h5_franka_1rgb, h5_ur_1rgb, and h5_franka_fr3_dual, the image channel order is BGR. 2. For all other robotic embodiments, the image channel order is RGB. ```python if sensor_type == 'rgb_images': # These embodiments image data are recorded in BGR if cur_embodiments in ['h5_franka_3rgb', 'h5_franka_1rgb', 'h5_ur_1rgb', 'h5_franka_fr3_dual']: img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Other embodiments image data are recorded in RGB else: img = img ``` ## 🛠️ Dataset Utilities & Scripts To facilitate data validation and processing, we maintain an official utility repository on GitHub: **[RoboMIND-dataset-utils](https://github.com/Open-X-Humanoid/RoboMIND-dataset-utils)**. This toolkit currently provides: * **Data Quality Validation:** Scripts to perform full scans across the `benchmark1_0/1_1/1_2` data and generate detailed CSV reports. * **End-Effector Pose Recalculation:** For specific data subsets (`h5_ur_1rgb`, `h5_simulation`, `h5_sim_franka_3rgb`), we provide forward kinematics scripts to recalculate accurate end-effector poses directly from `joint_position` data and the corresponding URDF models. We highly recommend utilizing these scripts if your VLA training or evaluation pipeline strictly relies on end-effector poses. ## 📖 Version Update 📖 ### Version 1.1 & 1.2 Compared to Version 1.0, we further expanded the dataset, which now includes 107K trajectories, 479 tasks, and covers 96 different object classes. In version 1.2, we added 10 tasks of Upright_Cup data to Version 1.1, including 1 real-world task and 9 tasks from the digital twin environment. The goal of these 10 tasks is to flip a mug, but they involve different environmental settings, such as the range of mug placement, table textures, and mug appearances. The frame-level fine-grained language instruction annotation data has been updated. Please refer to [language_description_annotation_json](https://huggingface.co/datasets/x-humanoid-robomind/RoboMIND/tree/main/static/language_description_annotation_json) For more HDF5 file formats, please refer to [all_robot_h5_info_v1.2.md](./static/all_robot_h5_info_v1.2.md). ### Version 1.0 The initial version of RoboMIND contains 55K trajectories, and 279 tasks, and involves 69 different object classes. ## 📝 Citation 📝 If you find RoboMIND helpful in your research, please consider citing: ``` @inproceedings{wu2025robomind, title={Robomind: Benchmark on multi-embodiment intelligence normative data for robot manipulation}, author={Wu, Kun and Hou, Chengkai and Liu, Jiaming and Che, Zhengping and Ju, Xiaozhu and Yang, Zhuqin and Li, Meng and Zhao, Yinuo and Xu, Zhiyuan and Yang, Guang and others}, booktitle={Robotics: Science and Systems (RSS) 2025}, year={2025}, publisher={Robotics: Science and Systems Foundation}, url={https://www.roboticsproceedings.org/rss21/p152.pdf} } ``` ## Reference Document ## For the input and output when training the model, please refer to [robomind.yaml](./static/robomind.yaml). ## 🗨️ Discussions 🗨️ If you're interested in RoboMIND, welcome to join our WeChat group for discussions. <img src="./static/images/qrcode.jpg" border=0 width=30%>

# [RoboMIND: 面向机器人操作的多形态智能基准数据集（Multi-embodiment Intelligence Normative Data for Robot Manipulation）](https://x-humanoid-robomind.github.io/) [![许可证](https://img.shields.io/badge/License-Apache_2.0-yellow.svg)](https://opensource.org/licenses/Apache-2.0) [![项目页面](https://img.shields.io/badge/Project%20Page-RoboMIND-blue.svg)](https://x-humanoid-robomind.github.io/) [![数据集](https://img.shields.io/badge/Dataset-flopsera-000000.svg)](https://www.beaicloud.com/datasets/datasetDetail?path=%2Fdata-detail%2F21181956226031626&type=open) [![Hugging Face](https://img.shields.io/badge/Hugging_Face-RoboMIND-000000.svg)](https://huggingface.co/datasets/x-humanoid-robomind/RoboMIND) [![arXiv](https://img.shields.io/badge/arXiv-2412.13877-red.svg?style=flat-square)](https://arxiv.org/abs/2412.13877) 已被 [Robotics: Science and Systems (RSS) 2025](https://roboticsconference.org/program/papers/152/) 收录。 ## 💾 RoboMIND 数据集概述 💾 <img src="./static/images/piechart_new.png" border=0 width=100%> ### 🤖 RoboMIND 数据集构成 🤖 我们提出了RoboMIND（面向机器人操作的多形态智能基准数据集，Multi-embodiment Intelligence Normative Dataset and Benchmark for Robot Manipulation），这是一个规模全面的真实世界演示数据集，包含10.7万条真实演示轨迹，覆盖479个不同任务，涉及96种独特的物体类别。 RoboMIND数据集整合了来自多种机器人形态的遥操作数据（teleoperation data），其中包括：Franka Emika Panda单臂机器人的52926条轨迹、Tien Kung人形机器人的19152条轨迹、AgileX Cobot Magic V2.0双臂机器人的10629条轨迹，以及UR-5e单臂机器人的25170条轨迹。 RoboMIND涵盖了丰富的任务类型与多样的物体类别，为研究人员与开发者推进机器人学习与自动化技术的发展提供了宝贵的资源。该数据集凭借其庞大的规模与卓越的质量，能够确保在实际应用中具备出色的有效性与可靠性。 ### 🔎 轨迹长度分布 🔎 不同的机器人形态具有差异化的轨迹长度分布。Franka与UR机器人的轨迹通常较短，时长不足200个时间步，非常适合用于训练基础操作技能。与之相对，Tien Kung与AgileX机器人的轨迹普遍更长，时长超过500个时间步，更适合用于训练长时序任务与复杂技能组合。 ### 🚀 任务类别 🚀 我们基于自然语言描述，并综合考虑物体尺寸、使用场景与操作技能等因素，将数据集的任务划分为六大类别：1) 铰接操作（Articulated Manipulations，Artic. M.）；2) 协同操作（Coordination Manipulations，Coord. M.）；3) 基础操作（Basic Manipulations，Basic M.）；4) 多物体交互（Multiple Object Interactions，Obj. Int.）；5) 精密操作（Precision Manipulations，Precision M.）；6) 场景理解（Scene Understanding，Scene U.）。除基础操作任务外，数据集还包含大量复杂任务，可为训练通用机器人策略（robotic policies）提供充足的数据支撑。 ### 💪 物体多样性 💪 该数据集涵盖96种独特的物体类别。在厨房场景中，包含草莓、鸡蛋、香蕉、梨等常见食材，以及烤箱、面包机等复杂可调式家电；在居家场景中，涵盖网球等刚性物体与玩具等可变形物体；在办公与工业场景中，则包含电池、齿轮等需要精密控制的小型物体。丰富的物体类型提升了数据集的复杂度，能够支撑训练可适配多种环境的通用操作策略。 <img src="./static/images/Distribution_new.png" border=5 width=95%> ## 📁 数据描述 📁 构建高质量的机器人训练数据集，对于开发具备强泛化能力的端到端具身AI（embodied AI）模型至关重要。理想的数据集应当覆盖多样的场景、任务类型与机器人形态，使模型能够适配不同环境并可靠执行各类任务。我们团队构建了这一大型真实世界机器人学习数据集，记录了复杂环境下长时序任务执行过程中的交互数据，能够支撑具备通用操作能力的模型训练。以下为部分目录结构示例，展示了使用Franka机器人完成单个任务时的2条训练轨迹与2条验证轨迹： . |-- h5_agilex_3rgb |-- h5_franka_1rgb | |-- bread_in_basket | | `-- success_episodes | | |-- train | | | |-- 1014_144602 | | | | `-- data | | | | `-- trajectory.hdf5 | | | |-- 1014_144755 | | | | `-- data | | | | `-- trajectory.hdf5 | | |-- val | | | |-- 1014_144642 | | | | `-- data | | | | `-- trajectory.hdf5 | | | |-- 1014_151731 | | | | `-- data | | | | `-- trajectory.hdf5 |-- h5_franka_3rgb |-- h5_simulation |-- h5_tienkung_gello_1rgb |-- h5_tienkung_xsens_1rgb |-- h5_ur_1rgb ## 🗃️ HDF5 文件格式 🗃️ 请参考[all_robot_h5_info.md](./static/all_robot_h5_info.md)。由于设备维护，h5_franka_3rgb文件夹中的675条轨迹仅包含左右相机的图像数据。关于具体的数据路径，请参考[franka_3rgb_2cam_paths.md](./static/franka_3rgb_2cam_paths.md)。在仿真数据中，相机与机械臂的采集频率约为1:4，且暂未提供深度图像。 ## 🧰 任务语言指令 🧰 我们已为每个任务提供了对应的语言指令文件[RoboMIND_instr.csv](./static/RoboMIND_v1_2_instr.csv)。 ## 📊 数据使用示例 📊 请参考[Quick_Start.ipynb](./quick_start.ipynb)。请注意： 1. 针对h5_franka_3rgb、h5_franka_1rgb、h5_ur_1rgb与h5_franka_fr3_dual，图像的通道顺序为BGR。 2. 针对其余所有机器人形态，图像的通道顺序为RGB。 python if sensor_type == 'rgb_images': # These embodiments image data are recorded in BGR if cur_embodiments in ['h5_franka_3rgb', 'h5_franka_1rgb', 'h5_ur_1rgb', 'h5_franka_fr3_dual']: img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Other embodiments image data are recorded in RGB else: img = img ## 📖 版本更新 📖 ### 版本1.1 & 1.2 相较于版本1.0，我们对数据集进行了进一步扩充，当前数据集包含10.7万条轨迹、479个任务，覆盖96种不同的物体类别。在版本1.2中，我们在版本1.1的基础上新增了10个Upright_Cup任务的数据，其中包含1个真实世界任务与9个来自数字孪生环境的任务。这10个任务的目标均为翻转马克杯，但涉及不同的环境设置，例如马克杯的放置范围、桌面纹理与马克杯外观。帧级细粒度语言指令标注数据已更新，请参考[language_description_annotation_json](https://huggingface.co/datasets/x-humanoid-robomind/RoboMIND/tree/main/static/language_description_annotation_json) 关于更多HDF5文件格式的说明，请参考[all_robot_h5_info_v1.2.md](./static/all_robot_h5_info_v1.2.md)。 ### 版本1.0 RoboMIND的初始版本包含5.5万条轨迹、279个任务，涉及69种不同的物体类别。 ## 📝 引用 📝 如果您的研究中用到了RoboMIND，请引用以下文献： @inproceedings{wu2025robomind, title={Robomind: Benchmark on multi-embodiment intelligence normative data for robot manipulation}, author={Wu, Kun and Hou, Chengkai and Liu, Jiaming and Che, Zhengping and Ju, Xiaozhu and Yang, Zhuqin and Li, Meng and Zhao, Yinuo and Xu, Zhiyuan and Yang, Guang and others}, booktitle={Robotics: Science and Systems (RSS) 2025}, year={2025}, publisher={Robotics: Science and Systems Foundation}, url={https://www.roboticsproceedings.org/rss21/p152.pdf} } ## 参考文档 ## 关于模型训练时的输入与输出格式，请参考[robomind.yaml](./static/robomind.yaml)。 ## 🗨️ 讨论 🗨️ 如果您对RoboMIND感兴趣，欢迎加入我们的微信群进行交流。 <img src="./static/images/qrcode.jpg" border=0 width=30%>

提供机构：

maas

创建时间：

2025-09-15

搜集汇总

数据集介绍