five

Daimon-Infinity

收藏
魔搭社区2026-05-17 更新2026-04-19 收录
下载链接:
https://modelscope.cn/datasets/daimonrobotics/Daimon-Infinity
下载链接
链接失效反馈
官方服务:
资源简介:
<div align=" center"> <h1 style="margin: 0;">Daimon-Infinity</h1> </div> <div align=" center"> The world's largest omni-modal robotics dataset for physical AI, including high-resolution tactile sensing. </div> <div align="center"> <img src="https://modelscope.cn/datasets/daimonrobotics/images/resolve/master/images/front_en.png" alt="TACEXO Cover" width="100%"> </div> <div style="display: flex; justify-content: space-between; align-items: center; width: 100%;"> <div> <a href="https://modelscope.cn/datasets/daimonrobotics/TacExo"> <img src="https://img.shields.io/badge/Modelscope-1890FF?style=for-the-badge&logo=alibabacloud" alt="Modelscope"> </a> </div> </div> 🌐 Language: English | [中文](README_CN.md) --- <div style="font-size:1.1em; margin: 0 0 16px 0; text-align: left;"> To address the key bottleneck in fine manipulation data within the embodied intelligence field, Daimon has introduced the <span style="color:#00008B">D</span><span style="color:#0000CD">a</span><span style="color:#1E90FF">i</span><span style="color:#00BFFF">m</span><span style="color:#87CEFA">o</span><span style="color:#ADD8E6">n</span><span style="color:#B0E0E6">-</span><span style="color:#0000CD">I</span><span style="color:#1E90FF">n</span><span style="color:#00BFFF">f</span><span style="color:#87CEFA">i</span><span style="color:#ADD8E6">n</span><span style="color:#B0E0E6">i</span><span style="color:#87CEFA">t</span><span style="color:#00BFFF">y</span> dataset. The project will open-source over 10,000 hours of vision–tactile–language–action (VTLA) multimodal data to the entire industry. Its goal is to build a tactile-centric data ecosystem for embodied intelligence, thereby advancing breakthroughs and scalable development in robotic fine manipulation capabilities. We believe that large-scale, high-quality tactile data will become a critical driving force for the next generation of embodied intelligent systems. </div> <a id="Update"></a> ## 📢 Update - [April 15, 2026] 🆕 Daimon-Infinity is released!🔥 ## 📋 Table of Contents - [Daimon-Infinity Tactile Manipulation Dataset](#tacexo-tactile-manipulation-dataset) - [📢 Updates](#-updates) - [📋 Table of Contents](#-table-of-contents) - [✨ Overview](#-overview) - [🤖 Hardware Platform](#-hardware-platform) - [DM-DataClaw & DM-TacClaw](#dm-dataclaw--dm-tacclaw) - [DM-DataDex](#dm-datadex) - [🚀 Data Modalities](#-data-modalities) - [DM-DataClaw](#dm-dataclaw) - [🛠️ Tool Repo](#️-Tool-repo) - [🎬 Tasks and Data Content Overview](#-tasks-and-data-content-overview) - [Semantic Labels](#semantic-labels) - [Data Statistics](#data-statistics) - [📦 Dataset](#-dataset) - [Dataset Structure](#dataset-structure) - [Data Format](#data-format) - [Main Data Files](#main-data-files) - [Field Descriptions](#field-descriptions) - [Observation.state Dimensions (114-D)](#observationstate-dimensions-114-d) - [Action Dimensions (111-D)](#action-dimensions-111-d) - [Placeholder Values](#placeholder-values) - [Video Description](#video-description) - [Tactile Video Description](#tactile-video-description) - [Audio Description](#audio-description) - [Metadata Description: episodes_metadata.json](#metadata-description-episodes_metadatajson) - [💡 Training Recommendations](#-training-recommendations) - [📥 Data Access](#-data-access) - [Sample Data](#sample-data) - [📋 Communication](#-communication) - [📝 Citation](#-citation) - [📄 License](#-license) <a id="Overview"></a> ## ✨ Overview We are now releasing the first **1,000 hours** of the **Daimon-Infinity Dataset**, entirely collected by our team. The majority of this batch was collected using the **DM-DataClaw**, while a smaller portion was captured using the **DM-DataDex**. Both devices support embodiment-free data collection in real environments, enabling improved generalization and scalable data acquisition efficiency. <a id="Hardware Platform"></a> ## 🤖 Hardware Platform <a id="DM-DataClaw&DM-TacClaw"></a> ### DM-DataClaw & DM-TacClaw <div align="center"> <img src="https://modelscope.cn/datasets/daimonrobotics/images/resolve/master/images/tac&dataclaw_en.png" alt="相机配置" width="100%" style="display:inline-block; margin-right: 10px;"> </div> The **DM-DataClaw** is a next-generation tactile UMI device with improvements in both hardware and user experience: - **Structural Design** — Features a lightweight body and ergonomically optimized design, making operation more convenient while providing operators with more direct and stable tactile feedback during data collection. - **Interaction System** — Integrates headphones and microphones, a mobile mini-app interface, dual physical control buttons, and status indicators, forming a low-latency, intuitive, and efficient human–machine interaction workflow. - **Multimodal Sensor Integration** — Equipped with a compact wide-angle fisheye camera for visual perception, a stereo camera + IMU module for high-precision spatial trajectory tracking, and fingertip high-resolution tactile sensors along with gripper position encoders, enabling comprehensive vision–tactile–motion data acquisition. **DM-TacClaw** is an electrically actuated gripper designed with an isomorphic structure to **DM-DataClaw**, enabling zero-shot or few-shot deployment, validation, and execution of models trained on **DM-DataClaw** data: - **Integrated Multimodal Perception** — Equipped with a built-in wide-angle fisheye camera covering the manipulation workspace, and two fingers embedded with high-resolution visuotactile sensors, enabling deep fusion of vision and touch for precise perception of environmental context and contact details. - **Balanced Performance and Versatility** — Features a large 100 mm stroke and 0.1 mm positioning accuracy, supporting both wide-range grasping and fine manipulation, suitable for stable gripping and complex task execution across diverse objects. - **System Isomorphism and Flexible Control** — Maintains a consistent spatial configuration under the fisheye view with the **DM-DataClaw**, facilitating rapid model transfer and validation; meanwhile, it supports configurable position, velocity, and torque control for highly flexible motion execution and policy adaptation. <a id="DM-DataDex"></a> ### DM-DataDex <!-- TODO --> <div align="center"> <img src="https://modelscope.cn/datasets/daimonrobotics/images/resolve/master/images/datadex_en.png" alt="相机配置" width="100%" style="display:inline-block; margin-right: 10px;"> </div> The **DM-DataDex** is built on **TacExoGaze**, an exoskeleton-based multi-modal data acquisition system designed for collecting high-fidelity VTLA data for dexterous manipulation. Rather than using a gripper form factor, TacExoGaze directly captures human hand motion and contact interactions through a wearable exoskeleton instrumented with dense sensing, enabling scalable and precise data collection for embodied AI. - **Form factor** — Wearable exoskeleton that preserves natural human hand kinematics during manipulation. - **Tactile sensing** — High-resolution sensors distributed across fingertips for capturing deformation, shear, and contact geometry. - **Hand tracking** — Exoskeleton joint encoders for finger pose and wrist-mounted trackers for wrist pose. - **Gaze & head tracking** — VR headset providing egocentric camera, eye tracking, and 6-DoF head pose. - **Third-view capture** — External camera with attached tracker for scene-level observation with registered pose. <a id="Data Modalities"></a> ## 🚀 Data Modalities #### DM-DataClaw | Modality | Sensor | |---|---| | RGB | Wrist-mounted wide-angle fisheye camera | | Tactile | High-resolution tactile sensors on fingertips | | Pose | Dual-eye + IMU spatial tracking | | Gripper State | Gripper position encoder | | Language | Natural language task annotations | #### DM-DataDex (TacExoGaze) | Modality | Sensor | |---|---| | RGB (First-person) | VR headset egocentric camera | | RGBD (Third-view) | External camera with tracker | | Tactile | High-resolution tactile sensors on fingertips | | Hand Pose | Exoskeleton joint encoders | | Wrist Pose | Wrist-mounted tracker | | Gaze & Head Pose | VR headset (eye tracking + 6-DoF head pose) | | Third-view Pose | Tracker attached to external camera | | Language | Natural language task annotations | > All pose trackers (wrist, third-view camera, VR headset) use the same tracking system, ensuring consistent spatial alignment across modalities. <a id="Tool Repo"></a> ## 🛠️ Tool Repo We provide data visualization tools. The URDF files required for visualization can be found in the `assets` folder. For more details, please refer to the open-source repository: **[Daimon-Infinity-Lite](https://modelscope.cn/datasets/daimonrobotics/Daimon-Infinity-Lite)** 🔥 <a id="Tasks and Data Content Overview"></a> ## 🎬 Tasks and Data Content Overview The dataset covers a wide range of scenarios, including sorting, assembly operations, and pick-and-place tasks. It includes multimodal observations (such as RGB, tactile deformation/shear/depth, joint states, etc.) and a rich set of atomic skills (e.g., grasping, bimanual manipulation, and tool usage). ### Semantic Labels <div style="text-align: center;"> <img src="https://modelscope.cn/datasets/daimonrobotics/images/resolve/master/images/DataClawSample.png" width="100%"> </div> The figure above shows a sample task, demonstrating how a complete robot manipulation video can be segmented into multiple subtasks based on changes in operation states and phases. In this example, the robot has already completed the grasping stage, and the current segment corresponds to the **transport** phase—where the object is moved to a new position while maintaining a stable grasp. With this annotation method, a long and continuous manipulation process can be reorganized into a clearer sequence of stages, making it easier for browsing, annotation, retrieval, and downstream tasks such as action understanding, phase recognition, and manipulation learning. Each data entry is accompanied by multi-dimensional semantic labels, including: - **Object Labels**: Category of the manipulated object (e.g., tools, components, daily items, etc.) - **Skill Labels**: Atomic action types (e.g., grasping, placing, rotating, inserting, etc.) - **Task & Scene Identifiers**: Task encoding and scene classification - **End-effector Type**: DataClaw gripper - **Language Description**: Natural language description of the task ### Data Statistics Based on approximately 1,209 hours of collected data, we conducted a comprehensive statistical analysis of the dataset from multiple perspectives, including manipulated objects, action semantics, task hierarchy, data source composition, and action duration. The figures below present the object word cloud, action word cloud, task hierarchy, the proportion of data sources from DM-DataClaw and DM-DataDex, and the distribution of action durations. These visualizations provide an intuitive understanding of the dataset’s coverage, structural composition, and long-tail characteristics. <table align="center"> <tr> <td align="center" width="50%"> <img src="https://modelscope.cn/datasets/daimonrobotics/images/resolve/master/images/object_word_cloud.png" alt="Object Word Cloud" width="100%"> <br> <sub><b>Object Word Cloud</b>: Shows the distribution of manipulated objects in the dataset.</sub> </td> <td align="center" width="50%"> <img src="https://modelscope.cn/datasets/daimonrobotics/images/resolve/master/images/action_word_cloud.png" alt="Action Word Cloud" width="100%"> <br> <sub><b>Action Word Cloud</b>: Shows the distribution of atomic skills and action verbs.</sub> </td> </tr> </table> <table align="center"> <tr> <td align="center" width="50%" style="text-align: center; vertical-align: top;"> <img src="https://modelscope.cn/datasets/daimonrobotics/images/resolve/master/images/task_hierarchy.png" alt="Task Hierarchy" style="display: block; margin: 0 auto;"> <br> <sub><b>Task Hierarchy</b>: Illustrates the multi-level organization of the dataset from actions to specific instances.</sub> </td> <td align="center" width="50%" style="text-align: center; vertical-align: top;"> <img src="https://modelscope.cn/datasets/daimonrobotics/images/resolve/master/images/dataset_share_pie.png" alt="Dataset Composition" style="display: block; margin: 0 auto;"> <br> <sub><b>Data Source Composition</b>: Shows the proportion of DM-DataClaw and DM-DataDex within the dataset.</sub> </td> </tr> </table> <br> <div align="center"> <img src="https://modelscope.cn/datasets/daimonrobotics/images/resolve/master/images/action_duration.png" alt="Action Duration" width="100%"> <br> <sub><b>Action Duration Distribution</b>: Shows the duration statistics of different manipulation actions in the dataset.</sub> </div> The statistical results indicate that the dataset covers a wide range of object categories, including daily items, tools, containers, and food, as well as high-frequency manipulation skills such as `place`, `insert`, `cover`, `press`, and `secure`. This makes it well-suited for research in multi-task manipulation learning, skill generalization, and semantically conditioned control. <a id="Dataset"></a> ## 📦 Dataset <a id="Dataset Structure"></a> ### Dataset Structure **DM-DataTac (Bimanual Configuration):** ```text . ├── episodes_metadata.json # Episode metadata (including left/right device info, operator, calibration, etc.) ├── audio/ # Audio data (e.g., task descriptions) │ └── observation.audio.audio_pre/ │ └── chunk-000/ │ └── file-000.mp3 ├── data/ # Main data files (Parquet format) │ └── chunk-000/ │ └── file-000.parquet ├── meta/ # Metadata │ ├── info.json # Dataset configuration │ ├── stats.json # Dataset statistics │ ├── tasks.parquet # Task annotation data │ └── episodes/ # Episode-level metadata └── videos/ # Video data ├── observation.images.left_cam_left/ # Left camera (1920x1080) ├── observation.images.right_cam_right/ # Right camera (1920x1080) ├── observation.images.left_gripper_left_tactile/ # Left gripper (left tactile) (640x480) ├── observation.images.left_gripper_right_tactile/ # Left gripper (right tactile) (640x480) ├── observation.images.right_gripper_left_tactile/ # Right gripper (left tactile) (640x480) ├── observation.images.right_gripper_right_tactile/# Right gripper (right tactile) (640x480) ├── observation.deformation.*_tactile/ # Tactile deformation data (384x288) ├── observation.shear.*_tactile/ # Tactile shear data (384x288) └── observation.depth.*_tactile/ # Tactile depth data (384x288) ``` **DM-DataTac (Single-Hand Configuration):** ```text . ├── episodes_metadata.json # Episode metadata (including device info, operator, calibration, etc.) ├── data/ # Main data files (Parquet format) │ └── chunk-000/ │ └── file-000.parquet ├── meta/ # Metadata │ ├── info.json # Dataset configuration │ ├── stats.json # Dataset statistics │ ├── tasks.parquet # Task annotation data │ └── episodes/ # Episode-level metadata └── videos/ # Video data ├── observation.images.cam_*/ # Left/right cameras (1920x1080) ├── observation.images.gripper_left_tactile/ # Gripper left tactile (640x480) ├── observation.images.gripper_right_tactile/ # Gripper right tactile (640x480) ├── observation.deformation.gripper_*_tactile/ # Tactile deformation data (384x288) ├── observation.shear.gripper_*_tactile/ # Tactile shear data (384x288) └── observation.depth.gripper_*_tactile/ # Tactile depth data (384x288) ``` **DM-DataDex Configuration:** ```text . ├── data/ # Main data files (Parquet format) │ └── chunk-000/ │ └── file-000.parquet ├── meta/ # Metadata │ ├── info.json # Dataset configuration │ ├── stats.json # Dataset statistics │ ├── tasks.parquet # Task annotation data │ └── episodes/ # Episode-level metadata └── videos/ # Video data ├── observation.images.cam_headset.left_frame/ # Left-eye camera (640x480) ├── observation.images.cam_headset.right_frame/ # Right-eye camera (640x480) ├── observation.images.cam_third_view/ # Third-person camera (640x480) ├── observation.deformation.*_tactile/ # Tactile deformation data (384x288) ├── observation.shear.*_tactile/ # Tactile shear data (384x288) └── observation.depth.*_tactile/ # Tactile depth data (384x288) ``` <a id="Data Formate"></a> ### Data Format #### Main Data Files - **Format:** Parquet - **Path:** `data/chunk-xxx/file-xxx.parquet` - **Video Files:** `videos/{video_key}/chunk-xxx/file-xxx.mp4` - **Metadata:** `meta/info.json` #### Field Descriptions Each Parquet sample contains the following fields: | Field Name | Data Type | Shape | Description | |------------|-----------|-------|-------------| | `observation.state` | float32 | [114] | Robot state (including pose, joint angles, IMU, etc.) | | `action` | float32 | [111] | Teleoperation action commands | | `score` | float32 | [1] | Episode score | | `timestamp` | float32 | [1] | Timestamp | | `frame_index` | int64 | [1] | Frame index | | `episode_index` | int64 | [1] | Episode index | | `index` | int64 | [1] | Data index | | `task_index` | int64 | [1] | Task annotation index | #### Observation.state Dimension Details (114-D) | Index Range | Dimension Name | Description | |-------------|----------------|-------------| | 1–7 | left_x, left_y, left_z, left_qx, left_qy, left_qz, left_qw | Left arm end-effector pose (position + quaternion) | | 8–14 | right_x, right_y, right_z, right_qx, right_qy, right_qz, right_qw | Right arm end-effector pose (position + quaternion) | | 15–21 | head_x ~ head_qw | Head pose | | 22–28 | left_eye_x ~ left_eye_qw | Left eye pose | | 29–35 | right_eye_x ~ right_eye_qw | Right eye pose | | 36–42 | third_x ~ third_qw | Third-view pose | | 43–49 | arm_left_1 ~ arm_left_7 | Left arm joint angles | | 50–56 | arm_right_1 ~ arm_right_7 | Right arm joint angles | | 57–58 | head_pitch, head_yaw | Head joints | | 59–63 | hip_pitch, hip_yaw, knee, left_wheel, right_wheel | Base/chassis state | | 64 | gripper | Gripper state | | 65–66 | gripper_left, gripper_right | Left/right gripper opening angles | | 67–102 | finger0 ~ finger35 | Finger encoder data (left hand: finger0 - finger17, right hand: finger18 - finger35)| | 103–108 | left_Acc_X/Y/Z, left_Gyro_X/Y/Z | Left IMU data | | 109–114 | right_Acc_X/Y/Z, right_Gyro_X/Y/Z | Right IMU data | #### Action Dimension Details (111-D) | Index Range | Dimension Name | Description | |-------------|----------------|-------------| | 1–7 | left_x ~ left_qw | Target pose of left arm end-effector | | 8–14 | right_x ~ right_qw | Target pose of right arm end-effector | | 15–21 | head_x ~ head_qw | Target head pose | | 22–28 | hip_x ~ hip_qw | Target base/chassis pose | | 29–30 | v, w | Linear and angular velocity of the base | | 31–37 | left_eye_x ~ left_eye_qw | Target pose of left eye | | 38–44 | right_eye_x ~ right_eye_qw | Target pose of right eye | | 45–51 | third_x ~ third_qw | Target pose of third view | | 52–58 | arm_left_1 ~ arm_left_7 | Target joint positions of left arm | | 59–65 | arm_right_1 ~ arm_right_7 | Target joint positions of right arm | | 66–67 | head_pitch, head_yaw | Target head joints | | 68–72 | hip_pitch, hip_yaw, knee, left_wheel, right_wheel | Target base joints/wheels | | 73 | gripper | Target gripper state | | 74–75 | gripper_left, gripper_right | Target gripper opening angles (left/right) | | 76–111 | finger0 ~ finger35 | finger control targets (left hand: finger0 - finger17, right hand: finger18 - finger35) | #### Placeholder Value Description - The value `9930` indicates a **placeholder / invalid / inactive dimension** - It is recommended to convert this value into a **mask** before training, rather than treating it as a real physical value during normalization #### Video Description The dataset includes multiple video streams, varying by configuration: **dual_ugripper configuration:** | Video Key | Resolution | Description | Example Serial | |-----------|------------|-------------|----------------| | `observation.images.left_cam_left` | 1920×1080 | Left camera | - | | `observation.images.right_cam_right` | 1920×1080 | Right camera | - | | `observation.images.left_gripper_left_tactile` | 640×480 | Left gripper (left tactile sensor) | X25480083 | | `observation.images.left_gripper_right_tactile` | 640×480 | Left gripper (right tactile sensor) | X26040033 | | `observation.images.right_gripper_left_tactile` | 640×480 | Right gripper (left tactile sensor) | X26040206 | | `observation.images.right_gripper_right_tactile` | 640×480 | Right gripper (right tactile sensor) | X26040293 | **ugripper_right / ugripper_left configuration:** | Video Key | Resolution | Description | Example Serial | |-----------|------------|-------------|----------------| | `observation.images.cam_*` | 1920×1080 | Left/right cameras | - | | `observation.images.gripper_left_tactile` | 640×480 | Gripper left tactile sensor | X25510109 | | `observation.images.gripper_right_tactile` | 640×480 | Gripper right tactile sensor | X25480013 | - RGB videos (`observation.images.*`) are typically encoded in **H.264**, pixel format **yuv420p**, at **30 FPS** - Tactile-derived videos (`observation.deformation.*`, `observation.shear.*`, `observation.depth.*`) are typically stored as **.mov + FFV1 + gbrp16le** - Actual encoding and formats may vary depending on the released dataset #### Tactile Video Description Tactile videos include **deformation**, **shear**, and **depth** data, stored in `.mov` format. The tactile data is encoded in **uint16** and should be properly decoded during processing. #### Audio Description The `audio` folder contains recorded audio descriptions of tasks provided by the operator. <a id="Metadata Description: episodes_metadata.json"></a> ### Metadata Description: episodes_metadata.json Records left/right sources, operator (data collector), and calibration information separately. <a id="Training Recommendations"></a> ## 💡 Training Recommendations - Apply masking to dimensions with the placeholder value `9930`; do not treat them as real physical values during normalization - Perform standardization/normalization only on valid dimensions - Align data temporally using `episode_index + frame_index`, and ensure both video and state data use a unified `fps = 30` - Tactile data is encoded in `uint16`; decode and convert it back to floating-point values during processing - For the `dual_ugripper` configuration, ensure proper synchronization and alignment between left and right streams <a id="Data Access"></a> ## 📥 Data Access - **Public Platform:** This dataset will be publicly released on major platforms such as ModelScope, aiming to facilitate research and development for both domestic and international developers. ### Sample Data You can access one episode of sample data for both **DM-DataClaw** and **DM-DataDex** via the link below: 👉 [Daimon-Infinity-Lite](https://modelscope.cn/datasets/daimonrobotics/Daimon-Infinity-Lite) - **DM-DataClaw** sample - **DM-DataDex** sample <a id="Communication"></a> ## 📋 Communication If you have any further questions or are interested in collaboration, please feel free to contact us at: daimon-infinity@dmrobot.com <a id="Citation"></a> ## 📝 Citation If you use this dataset, please cite the corresponding papers below. ### DataClaw Dataset Citation ```bibtex @misc{tacclaw_infinity_2026, title = {TacClaw-Infinity: A Large-Scale VTLA Demonstration Dataset for Dexterous Manipulation}, author = {...}, year = {2026}, note = {Preprint forthcoming} } ``` ### DataDex Dataset Citation ```bibtex @misc{tacexogaze_2026, title = {TacExoGaze: A Human-Centric Glove Interface for Multi-View Vision-Tactile Data Collection in Dexterous Manipulation}, author = {...}, year = {2026}, note = {Preprint forthcoming} } ``` > The final versions of these papers will be released on arXiv and updated here once available. --- ## 📄 License This dataset is released under: **CC BY-NC-SA 4.0 License** ---
提供机构:
maas
创建时间:
2026-04-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作