Name: lidavidsh/franka-pick-kitchen-up-wrist-100ep-genesis
Creator: lidavidsh
Published: 2026-04-17 04:53:56
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/lidavidsh/franka-pick-kitchen-up-wrist-100ep-genesis

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - robotics tags: - LeRobot - Genesis - Franka - manipulation - synthetic-data - kitchen-scene - eye-in-hand - wrist-camera size_categories: - 10K<n<100K --- # Franka Pick-Cube — Kitchen Scene, Overhead + Wrist Camera (100 episodes) Synthetic expert demonstrations of a **Franka Panda** picking a red cube in a **realistic rustic kitchen** environment, captured with a **2-camera layout: static overhead + eye-in-hand wrist camera**. Generated with the Genesis physics simulator on an **AMD Radeon AI PRO R9700 (RDNA4)** GPU. Part of the [Robot Synthetic Data Generation Workshop](https://github.com/lidavidsh/Robot_synthetic_data_generation_workshop). > **Companion dataset**: [`lidavidsh/franka-pick-kitchen-100ep-genesis`](https://huggingface.co/datasets/lidavidsh/franka-pick-kitchen-100ep-genesis) — same 100 episodes, same seed, but with a **static side camera** instead of wrist. Use both for a head-to-head camera-layout ablation. ## Dataset Details | Item | Value | | ----------------- | ------------------------------------------------------- | | Format | LeRobot v3.0, AV1 video | | Scene | rustic_kitchen (126.9 MB GLB mesh) | | Anchor | floor_origin | | Episodes / Frames | 100 / 13,500 | | FPS | 30 | | Cameras | 2 × 640×480: `observation.images.up` (overhead) + `observation.images.side` (**wrist / eye-in-hand**) | | Action space | 9-DoF joint position (7 arm + 2 finger) | | Robot | Franka Panda | | Gen success rate | 100 / 100 = 100 % | | Size | ~198 MB | | Generation GPU | AMD Radeon AI PRO R9700 (RDNA4, gfx1201) | | Software | Genesis 0.4.5, LeRobot 0.4.4, ROCm 7.2, seed = 42 | ## Camera Layout (D040 configuration) The **wrist camera** is rigidly attached to the Franka `hand` link with the following hand-local pose ("D040" config), empirically validated on the flat scene with a **42.8 % mean policy success rate** (vs. 25 % for the up+side baseline, R9e/R9f experiments): | Parameter | Value | Note | | -------------- | -------------------- | --------------------------------------------------------- | | Position | `(0.05, 0.00, -0.08)` m | ~8 cm below hand origin, slightly offset toward fingers | | Look-at | `(0.00, 0.00, 0.10)` m | Looks toward the space between the gripper fingers | | Up-vector | `(0, 0, -1)` | Roll flipped so gripper opening is horizontal in the view | | FOV | 65° | Wide enough to keep the cube in frame during approach | | Resolution | 640 × 480 | Same as overhead | The **overhead camera** is world-fixed at a canonical top-down pose identical to the companion `up+side` dataset. **Why D040 over a closer / narrower wrist pose?** See the workshop repo's `exps/rdna4_exp.md` "D040 vs D000 对比" section — short answer: D040 trades human-readable cube visibility for lower motion sensitivity + broader context + clear information-orthogonality with the overhead cam, which empirically yields a much more robust policy. ## Usage ```python from lerobot.datasets.lerobot_dataset import LeRobotDataset dataset = LeRobotDataset("lidavidsh/franka-pick-kitchen-up-wrist-100ep-genesis") print(f"Episodes: {dataset.meta.total_episodes}, Frames: {len(dataset)}") ``` ### Training with SmolVLA ```bash python scripts/02_train_vla.py \ --dataset-id lidavidsh/franka-pick-kitchen-up-wrist-100ep-genesis \ --n-steps 4000 --batch-size 4 --num-workers 4 \ --save-dir outputs/smolvla_kitchen_wrist ``` ### Evaluation (kitchen scene, wrist layout) ```bash python scripts/04_eval_custom_scene.py \ --policy-type smolvla \ --checkpoint outputs/smolvla_kitchen_wrist/final \ --dataset-id lidavidsh/franka-pick-kitchen-up-wrist-100ep-genesis \ --camera-layout up_wrist \ --n-episodes 50 --max-steps 150 --seed 99 ``` Note: evaluation **must** be run with `--camera-layout up_wrist` so the wrist camera is re-attached to the robot hand link at runtime — otherwise the `observation.images.side` feed will be a static side view, which is semantically incompatible with the trained policy. ## Scene / Camera Comparison | Dataset | Scene | Camera Layout | Baseline policy success (flat proxy) | | --- | --- | --- | --- | | [`franka-pick-100ep-genesis`](https://huggingface.co/datasets/lidavidsh/franka-pick-100ep-genesis) | Flat plane | up + side (static) | ~25 % | | [`franka-pick-kitchen-100ep-genesis`](https://huggingface.co/datasets/lidavidsh/franka-pick-kitchen-100ep-genesis) | Kitchen | up + side (static) | — | | **this dataset** | Kitchen | **up + wrist (eye-in-hand)** | **42.8 %** (flat proxy, R9f) | ## Data Generation Generated with [`scripts/02_gen_data_custom_scene.py`](https://github.com/lidavidsh/Robot_synthetic_data_generation_workshop/blob/main/scripts/02_gen_data_custom_scene.py) using IK-planned trajectories: ``` home → hover above cube → descend → close gripper → lift → hold ``` Cube XY positions are randomized within the robot's reachable workspace (`dx ∈ [0.4, 0.7] m, dy ∈ [-0.2, 0.2] m`, seed = 42 for reproducibility). Exact reproduction command: ```bash python scripts/02_gen_data_custom_scene.py \ --scene rustic_kitchen --anchor floor_origin \ --n-episodes 100 \ --camera-layout up_wrist \ --repo-id local/franka-pick-kitchen-up-wrist-100ep-genesis \ --seed 42 ``` Runtime on RDNA4 R9700: **~23 min** for 100 episodes (13.9 s/ep, all videos encoded SVT-AV1).

应用场景：