RoboManip-Traj-Demo
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/Codatta/RoboManip-Traj-Demo
下载链接
链接失效反馈官方服务:
资源简介:
# Codatta Robotic Manipulation Trajectory (Sample)
## Dataset Summary
This dataset contains high-quality annotated trajectories of robotic gripper manipulations. It is designed to train models for fine-grained control, trajectory prediction, and object interaction tasks.
Produced by **Codatta**, this dataset focuses on third-person views of robotic arms performing pick-and-place or manipulation tasks. Each sample includes the raw video, a visualization of the trajectory, and a rigorous JSON annotation of keyframes and coordinate points.
**Note:** This is a sample dataset containing **50 annotated examples**.
## Supported Tasks
* **Trajectory Prediction:** Predicting the path of a gripper based on visual context.
* **Keyframe Extraction:** Identifying critical moments in a manipulation task (e.g., contact, velocity change).
* **Robotic Control:** Imitation learning from human-demonstrated or teleoperated data.
## Dataset Structure
### Data Fields
* **`id`** (string): Unique identifier for the trajectory sequence.
* **`total_frames`** (int32): Total number of frames in the video sequence.
* **`video_path`** (string): Path to the source MP4 video file recording the manipulation action.
* **`trajectory_image`** (image): A JPEG preview showing the overlaid trajectory path or keyframe visualization.
* **`annotations`** (string): A JSON-formatted string containing the detailed coordinate data.
* *Structure:* Contains lists of keyframes, timestamp, and the 5-point coordinates for the gripper in each annotated frame.
### Data Preview
*(Hugging Face's viewer will automatically render the `trajectory_image` here)*
## Annotation Standards
The data was annotated following a strict protocol to ensure precision and consistency.
### 1. Viewpoint Scope
* **Included:** Third-person views (fixed camera recording the robot).
* [cite_start]**Excluded:** First-person views (Eye-in-Hand) are explicitly excluded to ensure consistent coordinate mapping[cite: 5, 15].
### 2. Keyframe Selection
Annotations are not dense (every frame) but sparse, focusing on **Keyframes** that define the motion logic. [cite_start]A Keyframe is defined by the following events [cite: 20-25]:
1. [cite_start]**Start Frame:** The gripper first appears in the screen[cite: 21].
2. [cite_start]**End Frame:** The gripper leaves the screen[cite: 22].
3. [cite_start]**Velocity Change:** Frames where the speed direction suddenly changes (marking the minimum speed point)[cite: 23].
4. [cite_start]**State Change:** Frames where the gripper opens or closes[cite: 24].
5. [cite_start]**Contact:** The precise moment the gripper touches the object[cite: 25].
### 3. The 5-Point Annotation Method
[cite_start]For every annotated keyframe, the gripper is labeled with **5 specific coordinate points** to capture its pose and state accurately[cite: 27]:
| Point ID | Description | Location Detail |
| :--- | :--- | :--- |
| **Point 1 & 2** | **Fingertips** | [cite_start]Center of the bottom edge of the gripper tips[cite: 28, 29]. |
| **Point 3 & 4** | **Gripper Ends** | [cite_start]The rearmost points of the closing area (indicating the finger direction)[cite: 31]. |
| **Point 5** | **Tiger's Mouth** | [cite_start]The center of the crossbeam (base of the gripper)[cite: 32]. |
### 4. Quality Control
* [cite_start]**Accuracy:** All datasets passed a rigorous quality assurance process with a minimum **95% accuracy rate**[cite: 78].
* **Occlusion Handling:** If the gripper is partially occluded, points are estimated based on object geometry. [cite_start]Sequences where the gripper is fully occluded or only shows a side profile without clear features are discarded[cite: 58, 63].
## Usage Example
```python
from datasets import load_dataset
import json
# Load the dataset
ds = load_dataset("Codatta/robotic-manipulation-trajectory", split="train")
# Access a sample
sample = ds[0]
# View the image
print(f"Trajectory ID: {sample['id']}")
sample['trajectory_image'].show()
# Parse annotations
annotations = json.loads(sample['annotations'])
print(f"Keyframes count: {len(annotations)}")
# Codatta 机器人操作轨迹数据集(样本版)
## 数据集概述
本数据集包含高质量的标注机器人夹持器操作轨迹,旨在为细粒度控制、轨迹预测以及物体交互任务的模型训练提供支撑。
本数据集由**Codatta**团队制作,聚焦于机械臂执行拾取-放置或操作任务的第三人称视角场景。每个样本均包含原始视频、轨迹可视化结果,以及包含关键帧与坐标点的标准化JSON标注文件。
**注意:本数据集为样本集,仅包含50个标注样本。**
## 支持任务
* **轨迹预测**:基于视觉上下文预测夹持器的运动路径。
* **关键帧提取**:识别操作任务中的关键节点(如接触、速度变化时刻)。
* **机器人控制**:基于人类演示或遥操作数据开展模仿学习。
## 数据集结构
### 数据字段
* **`id`**(字符串型):轨迹序列的唯一标识符。
* **`total_frames`**(int32 类型):视频序列的总帧数。
* **`video_path`**(字符串型):记录操作动作的原始MP4视频文件路径。
* **`trajectory_image`**(图像类型):叠加了轨迹路径或关键帧可视化结果的JPEG预览图。
* **`annotations`**(字符串型):包含详细坐标数据的JSON格式字符串。
* *结构说明*:包含关键帧列表、时间戳,以及每个标注帧中夹持器的5点坐标信息。
### 数据预览
*(Hugging Face 可视化工具将自动渲染`trajectory_image`字段内容)*
## 标注规范
本数据集遵循严格的标注协议以确保精度与一致性。
### 1. 视角范围
* **包含场景**:第三人称视角(固定相机拍摄机械臂)。
* **排除场景**:明确排除第一人称视角(Eye-in-Hand),以确保坐标映射的一致性[引用来源:5、15]。
### 2. 关键帧选取
本次标注并非逐帧密集标注,而是采用稀疏标注策略,聚焦于定义运动逻辑的**关键帧**。[引用来源:20-25]关键帧由以下事件定义:
1. **起始帧**:夹持器首次出现在画面中的帧[引用来源:21]。
2. **结束帧**:夹持器离开画面的帧[引用来源:22]。
3. **速度变化帧**:运动方向突然改变的帧(对应速度极小值点)[引用来源:23]。
4. **状态变化帧**:夹持器开合状态发生改变的帧[引用来源:24]。
5. **接触帧**:夹持器与物体接触的精确时刻[引用来源:25]。
### 3. 五点标注法
[引用来源:27]针对每个标注的关键帧,我们通过5个特定坐标点来精准捕捉夹持器的位姿与状态:
| 点ID | 描述 | 位置细节 |
| :--- | :--- | :--- |
| **点1与点2** | **指尖** | [引用来源:28、29]夹持器指尖底部边缘的中心点。 |
| **点3与点4** | **夹持器末端** | [引用来源:31]夹持区域的最后端点位(用于指示手指朝向)。 |
| **点5** | **虎口(Tiger's Mouth)** | [引用来源:32]横梁(夹持器基座)的中心点。 |
### 4. 质量管控
* **精度要求**:所有数据集均通过严格的质量保证流程,标注准确率不低于95%[引用来源:78]。
* **遮挡处理**:若夹持器被部分遮挡,将基于物体几何结构估算点位。[引用来源:58、63]若夹持器被完全遮挡,或仅呈现侧面轮廓且无清晰特征,则该序列将被弃用。
## 使用示例
python
from datasets import load_dataset
import json
# 加载数据集
ds = load_dataset("Codatta/robotic-manipulation-trajectory", split="train")
# 访问单个样本
sample = ds[0]
# 查看可视化图像
print(f"轨迹ID:{sample['id']}")
sample['trajectory_image'].show()
# 解析标注信息
annotations = json.loads(sample['annotations'])
print(f"关键帧数量:{len(annotations)}")
提供机构:
maas
创建时间:
2025-11-28



