lidavidsh/franka-pick-kitchen-up-wrist-100ep-genesis
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/lidavidsh/franka-pick-kitchen-up-wrist-100ep-genesis
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- robotics
tags:
- LeRobot
- Genesis
- Franka
- manipulation
- synthetic-data
- kitchen-scene
- eye-in-hand
- wrist-camera
size_categories:
- 10K<n<100K
---
# Franka Pick-Cube — Kitchen Scene, Overhead + Wrist Camera (100 episodes)
Synthetic expert demonstrations of a **Franka Panda** picking a red cube in a **realistic rustic kitchen** environment, captured with a **2-camera layout: static overhead + eye-in-hand wrist camera**. Generated with the Genesis physics simulator on an **AMD Radeon AI PRO R9700 (RDNA4)** GPU.
Part of the [Robot Synthetic Data Generation Workshop](https://github.com/lidavidsh/Robot_synthetic_data_generation_workshop).
> **Companion dataset**: [`lidavidsh/franka-pick-kitchen-100ep-genesis`](https://huggingface.co/datasets/lidavidsh/franka-pick-kitchen-100ep-genesis) — same 100 episodes, same seed, but with a **static side camera** instead of wrist. Use both for a head-to-head camera-layout ablation.
## Dataset Details
| Item | Value |
| ----------------- | ------------------------------------------------------- |
| Format | LeRobot v3.0, AV1 video |
| Scene | rustic_kitchen (126.9 MB GLB mesh) |
| Anchor | floor_origin |
| Episodes / Frames | 100 / 13,500 |
| FPS | 30 |
| Cameras | 2 × 640×480: `observation.images.up` (overhead) + `observation.images.side` (**wrist / eye-in-hand**) |
| Action space | 9-DoF joint position (7 arm + 2 finger) |
| Robot | Franka Panda |
| Gen success rate | 100 / 100 = 100 % |
| Size | ~198 MB |
| Generation GPU | AMD Radeon AI PRO R9700 (RDNA4, gfx1201) |
| Software | Genesis 0.4.5, LeRobot 0.4.4, ROCm 7.2, seed = 42 |
## Camera Layout (D040 configuration)
The **wrist camera** is rigidly attached to the Franka `hand` link with the following hand-local pose ("D040" config), empirically validated on the flat scene with a **42.8 % mean policy success rate** (vs. 25 % for the up+side baseline, R9e/R9f experiments):
| Parameter | Value | Note |
| -------------- | -------------------- | --------------------------------------------------------- |
| Position | `(0.05, 0.00, -0.08)` m | ~8 cm below hand origin, slightly offset toward fingers |
| Look-at | `(0.00, 0.00, 0.10)` m | Looks toward the space between the gripper fingers |
| Up-vector | `(0, 0, -1)` | Roll flipped so gripper opening is horizontal in the view |
| FOV | 65° | Wide enough to keep the cube in frame during approach |
| Resolution | 640 × 480 | Same as overhead |
The **overhead camera** is world-fixed at a canonical top-down pose identical to the companion `up+side` dataset.
**Why D040 over a closer / narrower wrist pose?** See the workshop repo's `exps/rdna4_exp.md` "D040 vs D000 对比" section — short answer: D040 trades human-readable cube visibility for lower motion sensitivity + broader context + clear information-orthogonality with the overhead cam, which empirically yields a much more robust policy.
## Usage
```python
from lerobot.datasets.lerobot_dataset import LeRobotDataset
dataset = LeRobotDataset("lidavidsh/franka-pick-kitchen-up-wrist-100ep-genesis")
print(f"Episodes: {dataset.meta.total_episodes}, Frames: {len(dataset)}")
```
### Training with SmolVLA
```bash
python scripts/02_train_vla.py \
--dataset-id lidavidsh/franka-pick-kitchen-up-wrist-100ep-genesis \
--n-steps 4000 --batch-size 4 --num-workers 4 \
--save-dir outputs/smolvla_kitchen_wrist
```
### Evaluation (kitchen scene, wrist layout)
```bash
python scripts/04_eval_custom_scene.py \
--policy-type smolvla \
--checkpoint outputs/smolvla_kitchen_wrist/final \
--dataset-id lidavidsh/franka-pick-kitchen-up-wrist-100ep-genesis \
--camera-layout up_wrist \
--n-episodes 50 --max-steps 150 --seed 99
```
Note: evaluation **must** be run with `--camera-layout up_wrist` so the wrist camera is re-attached to the robot hand link at runtime — otherwise the `observation.images.side` feed will be a static side view, which is semantically incompatible with the trained policy.
## Scene / Camera Comparison
| Dataset | Scene | Camera Layout | Baseline policy success (flat proxy) |
| --- | --- | --- | --- |
| [`franka-pick-100ep-genesis`](https://huggingface.co/datasets/lidavidsh/franka-pick-100ep-genesis) | Flat plane | up + side (static) | ~25 % |
| [`franka-pick-kitchen-100ep-genesis`](https://huggingface.co/datasets/lidavidsh/franka-pick-kitchen-100ep-genesis) | Kitchen | up + side (static) | — |
| **this dataset** | Kitchen | **up + wrist (eye-in-hand)** | **42.8 %** (flat proxy, R9f) |
## Data Generation
Generated with [`scripts/02_gen_data_custom_scene.py`](https://github.com/lidavidsh/Robot_synthetic_data_generation_workshop/blob/main/scripts/02_gen_data_custom_scene.py) using IK-planned trajectories:
```
home → hover above cube → descend → close gripper → lift → hold
```
Cube XY positions are randomized within the robot's reachable workspace (`dx ∈ [0.4, 0.7] m, dy ∈ [-0.2, 0.2] m`, seed = 42 for reproducibility).
Exact reproduction command:
```bash
python scripts/02_gen_data_custom_scene.py \
--scene rustic_kitchen --anchor floor_origin \
--n-episodes 100 \
--camera-layout up_wrist \
--repo-id local/franka-pick-kitchen-up-wrist-100ep-genesis \
--seed 42
```
Runtime on RDNA4 R9700: **~23 min** for 100 episodes (13.9 s/ep, all videos encoded SVT-AV1).
提供机构:
lidavidsh



