gate-institute/GATE-VLAP-datasets

Name: gate-institute/GATE-VLAP-datasets
Creator: gate-institute
Published: 2025-12-07 10:33:36
License: 暂无描述

Hugging Face2025-12-07 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/gate-institute/GATE-VLAP-datasets

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - reinforcement-learning - robotics tags: - robotics - libero - manipulation - semantic-action-chunking - vision-language - imitation-learning size_categories: - 100K<n<1M --- # GATE-VLAP Datasets **Grounded Action Trajectory Embeddings with Vision-Language Action Planning** This repository contains preprocessed datasets from the LIBERO benchmark suite in WebDataset TAR format, specifically designed for training vision-language-action models with semantic action segmentation. ## Data Format: WebDataset TAR We provide datasets in **WebDataset TAR format** for optimal performance: ✅ **Fast loading** - Efficient streaming during training ✅ **Easy downloading** - Single file per subtask ✅ **HuggingFace optimized** - Quick browsing and file listing ✅ **Inspectable** - Extract locally to view individual frames ### Extracting TAR Files ```bash # Download a subtask wget https://huggingface.co/datasets/gate-institute/GATE-VLAP-datasets/resolve/main/libero_10/pick_up_the_black_bowl.tar # Extract all files tar -xf pick_up_the_black_bowl.tar # View structure ls # Output: demo_0/ demo_1/ demo_2/ ... # View demo contents ls demo_0/ # Output: demo_0_timestep_0000.png demo_0_timestep_0000.json # demo_0_timestep_0001.png demo_0_timestep_0001.json # ... ``` ### Loading Raw Data (After Extraction) ```python from pathlib import Path import json from PIL import Image import numpy as np def load_demo(demo_dir): """Load a single demonstration from extracted TAR.""" frames = [] demo_path = Path(demo_dir) for json_file in sorted(demo_path.glob("*.json")): # Load metadata with open(json_file) as f: data = json.load(f) # Load image png_file = json_file.with_suffix(".png") data["image"] = np.array(Image.open(png_file)) frames.append(data) return frames # After extracting pick_up_the_black_bowl.tar demo = load_demo("demo_0") print(f"Demo length: {len(demo)} frames") print(f"Action shape: {demo[0]['action']}") ``` ### Loading with WebDataset (Direct Streaming) ```python import webdataset as wds from PIL import Image import json # Stream data directly from HuggingFace (no download needed!) url = "https://huggingface.co/datasets/gate-institute/GATE-VLAP-datasets/resolve/main/libero_10/pick_up_the_black_bowl.tar" dataset = wds.WebDataset(url).decode("rgb") for sample in dataset: # sample["png"] = PIL Image (128x128 RGB) # sample["json"] = bytes (JSON metadata) metadata = json.loads(sample["json"]) image = sample["png"] print(f"Action: {metadata['action']}") print(f"Image shape: {np.array(image).shape}") break ``` ### Training with Multiple Subtasks ```python import webdataset as wds import torch from torch.utils.data import DataLoader # Load multiple subtasks at once base_url = "https://huggingface.co/datasets/gate-institute/GATE-VLAP-datasets/resolve/main/libero_10/" subtasks = ["pick_up_the_black_bowl", "close_the_drawer", "open_the_top_drawer"] urls = [f"{base_url}{task}.tar" for task in subtasks] dataset = ( wds.WebDataset(urls) .decode("rgb") .to_tuple("png", "json") .map(preprocess_fn) # Your preprocessing function ) dataloader = DataLoader(dataset, batch_size=32, num_workers=4) for images, actions in dataloader: # Train your model pass ``` ## Datasets Included ### LIBERO-10 (Long-Horizon Tasks) - **Task Type**: 10 complex, long-horizon manipulation tasks - **Segmentation Method**: Semantic Action Chunking using Gemini Vision API - **Demos**: 1,354 demonstrations across 29 subtasks - **Frames**: 103,650 total frames - **TAR Files**: 29 files (one per subtask) **Example Tasks**: - `pick_up_the_black_bowl.tar` → Pick and place subtasks - `close_the_drawer.tar` → Approach, grasp, close subtasks - `put_the_bowl_in_the_drawer.tar` → Multi-step pick, open, place, close sequence ### LIBERO-Object (Object Manipulation) - **Task Type**: 10 object-centric manipulation tasks - **Segmentation Method**: Semantic Action Chunking using Gemini Vision API - **Demos**: 875 demonstrations across 20 subtasks - **Frames**: 66,334 total frames - **TAR Files**: 20 files (one per subtask) **Example Tasks**: - `pick_up_the_alphabet_soup.tar` → Approach, grasp, lift - `place_the_alphabet_soup_on_the_basket.tar` → Move, position, place, release ## 📁 Dataset Structure ``` gate-institute/GATE-VLAP-datasets/ ├── libero_10/ # Long-horizon tasks (29 TAR files) │ ├── close_the_drawer.tar │ ├── pick_up_the_black_bowl.tar │ ├── open_the_top_drawer.tar │ └── ... (26 more) │ ├── libero_object/ # Object manipulation (20 TAR files) │ ├── pick_up_the_alphabet_soup.tar │ ├── place_the_alphabet_soup_on_the_basket.tar │ └── ... (18 more) │ └── metadata/ # Dataset statistics & segmentation ├── libero_10_complete_stats.json ├── libero_10_all_segments.json ├── libero_object_complete_stats.json └── libero_object_all_segments.json ``` ### Inside Each TAR File After extracting `pick_up_the_black_bowl.tar`: ``` pick_up_the_black_bowl/ ├── demo_0/ │ ├── demo_0_timestep_0000.png # RGB observation (128×128) │ ├── demo_0_timestep_0000.json # Action + metadata │ ├── demo_0_timestep_0001.png │ ├── demo_0_timestep_0001.json │ └── ... ├── demo_1/ │ └── ... └── ... (all demos for this subtask) ``` ## Data Format ### JSON Metadata (per timestep) Each `.json` file contains: ```json { "action": [0.1, -0.2, 0.0, 0.0, 0.0, 0.0, 1.0], // 7-DOF action "robot_state": [...], // Joint state "demo_id": "demo_0", "timestep": 42, "subtask": "pick_up_the_black_bowl", "parent_task": "LIBERO_10", "is_stop_signal": false // Segment boundary } ``` ### Action Space - **Dimensions**: 7-DOF - `[0:3]`: End-effector position delta (x, y, z) - `[3:6]`: End-effector orientation delta (roll, pitch, yaw) - `[6]`: Gripper action (0.0 = close, 1.0 = open) - **Range**: Normalized to [-1, 1] - **Control**: Delta actions (relative to current pose) ### Image Format - **Resolution**: 128×128 pixels - **Channels**: RGB (3 channels) - **Format**: PNG (lossless compression) - **Camera**: Front-facing agentview camera ## Metadata Files Explained ### 1. `libero_10_complete_stats.json` **Purpose**: Overview statistics for the entire LIBERO-10 dataset **Use Cases**: - Understand dataset composition - Plan training splits - Check demo/frame distribution across tasks ### 2. `libero_10_all_segments.json` **Purpose**: Detailed segmentation metadata for each demonstration Contains semantic action chunks with: - Segment boundaries (start/end frames) - Action descriptions - Segment types (reach, grasp, move, place, etc.) - Gemini Vision API segmentation method **Use Cases**: - Train with semantic action chunks - Implement hierarchical policies - Analyze action primitives - Filter by segment type ### 3. `libero_object_complete_stats.json` **Purpose**: Statistics for LIBERO-Object dataset ### 4. `libero_object_all_segments.json` **Purpose**: Segmentation for LIBERO-Object demonstrations with semantic action chunking ## Citation If you use this dataset, please cite: ```bibtex @article{gateVLAP@SAC2026, title={Atomic Action Slicing: Planner-Aligned Options for Generalist VLA Agents}, author={Stefan Tabakov, Asen Popov, Dimitar Dimitrov, Ensiye Kiyamousavi and Boris Kraychev}, journal={arXiv preprint arXiv:XXXX.XXXXX}, conference={The 41st ACM/SIGAPP Symposium On Applied Computing (SAC2026), track on Intelligent Robotics and Multi-Agent Systems (IRMAS)}, year={2025} } @inproceedings{liu2023libero, title={LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning}, author={Liu, Bo and Zhu, Yifeng and Gao, Chongkai and Feng, Yihao and Liu, Qiang and Zhu, Yuke and Stone, Peter}, booktitle={NeurIPS Datasets and Benchmarks Track}, year={2023} } ``` ## Related Resources - **Model Checkpoints**: [gate-institute/GATE-VLAP](https://huggingface.co/gate-institute/GATE-VLAP) - **Original LIBERO**: [https://github.com/Lifelong-Robot-Learning/LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO) - **Paper**: Coming soon ## Acknowledgments - **LIBERO Benchmark**: Original dataset by Liu et al. (2023) - **Segmentation**: Gemini Vision API for semantic action chunking - **Institution**: [GATE Institute](https://www.gate-ai.eu/en/home/), Sofia, Bulgaria ## Contact For questions or issues, please contact the [GATE Institute](https://www.gate-ai.eu/en/home/). --- **Dataset Version**: 1.0 **Last Updated**: December 2025 **Maintainer**: [GATE Institute](https://www.gate-ai.eu/en/home/)

提供机构：

gate-institute

5,000+

优质数据集

54 个

任务类型

进入经典数据集