RoboXTechnologies/RoboX-EgoGrasp-v0.1
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/RoboXTechnologies/RoboX-EgoGrasp-v0.1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-4.0
task_categories:
- robotics
- video-classification
- object-detection
tags:
- egocentric
- grasping
- manipulation
- imitation-learning
- hand-object-interaction
- robotics
- crowdsourced
- embodied-ai
- ego-grasp
- robox
language:
- en
pretty_name: EgoGrasp
size_categories:
- n<1K
---
# EgoGrasp
<video controls width="900" src="https://huggingface.co/datasets/RoboXTechnologies/RoboX-EgoGrasp-v0.1/resolve/main/RoboX_movie.mp4"></video>
EgoGrasp is a crowdsourced egocentric video dataset of human grasping interactions, built for robotics imitation learning. Each clip captures a single grasp action filmed from a first-person perspective using a smartphone, covering 620+ unique everyday object categories.
## What's Included Here
This repository contains a **sample of 10 annotated clips** from the full EgoGrasp dataset. The sample is intended to help researchers evaluate data quality, annotation depth, and compatibility with their pipelines before requesting access to the full collection.
**To request access to the full dataset (1,800+ clips, 620+ object categories), visit [robox.to](https://robox.to).**
## Dataset Summary
- **Sample clips (this repo):** 10
- **Full dataset:** 1,800+ clips across 620+ object categories
- **Perspective:** First-person (egocentric), smartphone-captured
- **Source:** Crowdsourced via the RoboX mobile app
- **Annotations:** Multi-pass pipeline including hand keypoints, object bounding boxes and tracking, action segmentation, and spatial context labels
| Property | Value |
|----------|-------|
| Total clips | 10 |
| Total duration | 2 min (~0.0 hours) |
| Contributors | 2 (anonymized) |
| Clips with video | 10 |
| Verified clips | 10 |
| Campaign type | ego_grasp |
| Export date | 2026-04-08 |
| Schema version | 0.1 |
## Collection Method
Videos are collected through the RoboX mobile app by distributed contributors following structured task prompts. Contributors record short clips of themselves picking up, holding, and placing common household and workplace objects. Quality filtering and review are applied before clips enter the annotation pipeline.
The app captures video with rich per-frame metadata including camera pose (6DoF), IMU data (200Hz), hand keypoints (21 joints), body pose, object detection, scene planes, optical flow, audio levels, navigation data, and quality metrics. On-device processing applies face detection and blurring before the video leaves the device.
## Annotation Pipeline
Each clip is processed through a layered annotation pipeline:
1. **Hand keypoints** — 2D joint positions for both hands across all frames
2. **Object detection and tracking** — Bounding boxes with per-frame object identity tracking
3. **Action segmentation** — Temporal labels for reach, grasp, lift, hold, place, and release phases
4. **Spatial context** — Scene-level labels describing surface type, environment, and camera viewpoint
## Use Cases
EgoGrasp is designed for researchers working on dexterous manipulation, grasp planning, hand-object interaction modeling, and policy learning from human demonstrations. The egocentric viewpoint and real-world diversity make it well suited for sim-to-real transfer and learning from unstructured environments.
Specific applications include:
- Robotic manipulation / grasping policy training via imitation learning
- Object recognition in egocentric settings
- Hand-object interaction understanding
- Benchmarking grasp detection and grip classification models
## Dataset Structure
- `metadata/clips.json` — Per-clip metadata (device, duration, quality, contributor)
- `clips/` — Video files (MP4, H.265)
- `annotations/clips.jsonl` — **Dataset index**: per-clip metadata, labels, narration, action segments, file references
- `annotations/hand_keypoints/` — Per-frame hand joint positions (21 keypoints per hand, grip type)
- `annotations/object_tracks/` — Per-frame detected objects with bounding boxes
- `annotations/actions/` — Temporal action segments (reach, grasp, idle) derived from grip state changes
- `annotations/sensors/` — Per-frame sensor data: IMU (accelerometer, gyro, magnetometer), 6DoF camera pose, camera intrinsics
## Full Dataset Access
The complete EgoGrasp dataset is available upon request. Visit [robox.to](https://robox.to) to learn more and submit an access request.
## License
CC-BY-NC-SA-4.0 — Free for research and non-commercial use, with share-alike requirements.
## Citation
If you use EgoGrasp in your research, please cite:
```bibtex
@dataset{robox_ego_grasp_2026,
title={RoboX-EgoGrasp-v0.1},
author={RoboX Team},
year={2026},
campaign={EgoGrasp}
}
```
提供机构:
RoboXTechnologies



