HuggingFaceVLA/community_dataset_v3

Name: HuggingFaceVLA/community_dataset_v3
Creator: HuggingFaceVLA
Published: 2025-12-10 16:07:35
License: 暂无描述

Hugging Face2025-12-10 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/HuggingFaceVLA/community_dataset_v3

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 tags: - robotics - community - so100 - so101 - manipulation - smolvla - lerobot community - vision-language-action - embodied-ai - cross-embodiment task_categories: - robotics language: - en size_categories: - 10M<n<100M pretty_name: Community Dataset v3 --- # Lerobot Community Datasets v3 - A Cross-Embodiment Pretraining Dataset for Vision Language Action Models A large-scale robotics dataset for vision-language-action learning, featuring **791 datasets** across **46 robot types**, enabling cross-embodiment pretraining for generalist robot policies. ![3](https://cdn-uploads.huggingface.co/production/uploads/640e21ef3c82bd463ee5a76d/5v1NlxUHR_HV0tXYTK0ux.png) ## Overview This is a **crowdsourced, open-source dataset** compiled from **235 community contributors** worldwide. Building upon the pretraining datasets used for [SmolVLA](https://huggingface.co/blog/smolvla), [Community Datasets v1](https://huggingface.co/datasets/HuggingFaceVLA/community_dataset_v1) and [v2](https://huggingface.co/datasets/HuggingFaceVLA/community_dataset_v2), this cleaned and organized version opens the door for **cross-embodiment training** on another completely new batch of community-contributed data. The dataset spans **46+ robot embodiments** including single-arm, bimanual, mobile manipulation, and a few humanoid robots. All data was collected using the [LeRobot](https://github.com/huggingface/lerobot) framework and is compatible with the [VLAb](https://github.com/huggingface/VLAb) pretraining framework. ## 📊 Dataset Statistics | Metric | Value | |--------|-------| | **Total Datasets** | 791 | | **Total Episodes** | 50,622 | | **Total Frames** | 25,971,082 | | **Total Duration** | 251.5 hours (10.5 days) | | **Contributors** | 235 | | **Robot Types** | 46 different embodiments | | **Action Dimensions** | 12 different configurations | | **Average Hours/Dataset** | 0.30 | ## 🤖 Robot Type Distribution ### By Category - **Single-arm manipulators**: 88.4% (699 datasets) - **Bimanual systems**: 6.7% (53 datasets) - **Mobile manipulation**: 3.4% (27 datasets) - **Humanoid platforms**: 1.3% (10 datasets) - **Other configurations**: 0.3% (2 datasets) ### Top 10 Robot Types | Robot Type | Datasets | % | Category | |------------|----------|---|----------| | **so100** | 248 | 31.4% | Single-arm | | **so101_follower** | 124 | 15.7% | Single-arm | | **so100_follower** | 121 | 15.3% | Single-arm | | **so101** | 82 | 10.4% | Single-arm | | **arx5** | 43 | 5.4% | Single-arm | | **koch** | 38 | 4.8% | Single-arm | | **trossen_ai_mobile** | 25 | 3.2% | Mobile | | **bi_xarm6_follower** | 16 | 2.0% | Bimanual | | **so100_bimanual** | 12 | 1.5% | Bimanual | | **koch_follower** | 8 | 1.0% | Single-arm | ![6](https://cdn-uploads.huggingface.co/production/uploads/640e21ef3c82bd463ee5a76d/Trg81uewHlGHH30NZukK5.png) ## 🗂️ Dataset Structure ``` community_dataset_v3_clean/ ├── contributor1/ │ ├── dataset_name_1/ │ │ ├── data/ # Parquet files with observations │ │ │ ├── episode_000000.parquet │ │ │ ├── episode_000001.parquet │ │ │ └── ... │ │ ├── videos/ # MP4 recordings (multi-view) │ │ │ ├── episode_000000_image.mp4 │ │ │ └── ... │ │ └── meta/ # Metadata │ │ └── info.json │ └── dataset_name_2/ ├── contributor2/ └── ... ``` ## 🚀 Usage **1. Authenticate with Hugging Face** You need to be logged in to access the dataset: ```bash # Login to Hugging Face huggingface-cli login # Or alternatively, set your token as an environment variable # export HF_TOKEN=your_token_here ``` Get your token from [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) ### Download the Dataset ```python hf download HuggingFaceVLA/community_dataset_v1 \ --repo-type=dataset \ --local-dir /path/local_dir/community_dataset_v1 ``` ### Load Individual Datasets ```python from lerobot.common.datasets.lerobot_dataset import LeRobotDataset import os # Browse available datasets for contributor in os.listdir("./community_dataset_v3_clean"): contributor_path = f"./community_dataset_v3_clean/{contributor}" if os.path.isdir(contributor_path): for dataset in os.listdir(contributor_path): print(f"📁 {contributor}/{dataset}") # Load a specific dataset dataset = LeRobotDataset( repo_id="local", root="./community_dataset_v3/contributor_name/dataset_name" ) # Access data print(f"Episodes: {len(dataset.episode_data_index)}") print(f"Total frames: {len(dataset)}") ``` ### Train with VLAb This dataset is designed for cross-embodiment VLA training using [VLAb](https://github.com/huggingface/VLAb): ```bash accelerate launch --config_file accelerate_configs/multi_gpu.yaml \ src/lerobot/scripts/train.py \ --policy.type=smolvla2 \ --policy.repo_id=HuggingFaceTB/SmolVLM2-500M-Video-Instruct \ --dataset.repo_id="community_dataset_v3/contributor1/dataset1,community_dataset_v3/contributor2/dataset2" \ --dataset.root="./community_dataset_v3" \ --dataset.video_backend=pyav \ --dataset.features_version=2 \ --output_dir="./outputs/training" \ --batch_size=8 \ --steps=200000 \ --wandb.enable=true \ --wandb.project="smolvla2-cross-embodiment" ``` ## Training Challenges with Cross-Embodiment Data ### The Reality of Community-Contributed Data This dataset includes 791 datasets recorded by community members under different conditions worldwide, creating an authentic in-the-wild setup. While this diversity is valuable for cross-embodiment learning, it comes with real challenges: varying data quality, inconsistent recording setups, and heterogeneous robot configurations. Using these datasets out-of-the-box will likely result in random collate errors and warnings during training. ### What We Encountered During Data Cleaning Starting with 851 datasets, we systematically debugged and cleaned the collection. Here's what we found: #### 1. Missing Video Files (Primary Removal Reason) Some datasets had incomplete episode recordings where video files were missing: ``` ERROR Failed to load video for key 'observation.images.image' at episode X: [Errno 2] No such file or directory: '/path/to/episode_XXXXXX.mp4' ``` **Impact:** Training crashes when these episodes were sampled **Action:** Removed ~15-20 datasets with missing files #### 2. Data Type Incompatibilities Certain datasets returned inconsistent data types during batch formation: ``` RuntimeError: Could not infer dtype of dict AttributeError: 'list' object has no attribute 'device' ``` **Impact:** Random crashes during forward pass **Action:** Removed ~10-15 problematic datasets, implemented resilient batch collation #### 3. Multi-Camera Configuration Issues Different datasets had varying numbers of camera views, causing tensor shape mismatches: **Root cause:** The `max_num_images` parameter wasn't properly propagated in the codebase, leading to inconsistent image tensor shapes when datasets had different numbers of cameras (some had 2, others had 4+ views). **Impact:** Thousands of dimension/channel erros for the datasets with more than 3 images. **Action:** Set `config.max_num_images = 3` to standardize input. This number balances multi-view information (essential for spatial reasoning) while being compatible with most datasets in the collection - the majority of community datasets use 2-3 camera views for manipulation tasks. #### 4. Video Timing Misalignments Frame timestamps occasionally violated tolerance thresholds: ``` Some query timestamps violate tolerance (tensor([2.0667]) > tolerance_s=0.0001) ``` **Impact:** Minor temporal inconsistency, but training continued **Action:** Automatic fallback to closest frames ### Final Dataset Cleaning Results - **Original datasets:** 851 - **Datasets with missing files:** ~15-20 (removed) - **Datasets with data type issues:** ~10-15 (removed) - **Datasets with conversion failures:** 16 (fixed and reprocessed) - **Datasets with different FPS values:** Many datasets remain valid but have varying frame rates (some recorded at different fps than the standard 30fps) - **Final clean dataset:** 791 datasets ## 🎯 Intended Use This dataset enables: - **Cross-embodiment VLA training** - Learn policies that generalize across robot types - **Multi-task manipulation** - Pick & place, sorting, assembly, bimanual tasks - **Transfer learning** - Leverage diverse demonstrations for new robots - **Imitation learning research** - Large-scale behavior cloning - **Generalist robot policies** - Train models that work on multiple platforms - **Mobile manipulation** - Navigation + manipulation tasks - **Embodied AI research** - Vision-motor coordination ## 🏆 Top Contributors | Contributor | Datasets | % | |-------------|----------|---| | **shuohsuan** | 57 | 7.2% | | **villekuosmanen** | 47 | 5.9% | | **LeRobot-worldwide-hackathon** | 31 | 3.9% | | **lt-s** | 27 | 3.4% | | **Qipei** | 23 | 2.9% | | **bjb7** | 18 | 2.3% | | **kumarhans** | 18 | 2.3% | | **Ryosei2** | 17 | 2.1% | | **kyomangold** | 16 | 2.0% | | **psg777** | 16 | 2.0% | ## 🤝 Contributing Future contributions should follow: - LeRobot dataset format (v2.1+) - Consistent naming for features and camera views - Quality validation checks - Precise task descriptions - Robot type and action space metadata See the [LeRobot dataset guide](https://huggingface.co/blog/lerobot-datasets) for best practices. Please acknowledge all individual contributors who created the original datasets. ## 📄 License Released under **Apache 2.0 license**. Individual datasets may have additional attribution requirements. When using this dataset: - ✅ Cite the dataset and VLAb framework - ✅ Acknowledge community contributors - ✅ Follow Apache 2.0 license terms - ✅ Consider contributing your own data ## 🔗 Related Resources - [VLAb Framework](https://github.com/huggingface/VLAb) - Large-scale pre-training - [SmolVLA Model](https://huggingface.co/lerobot/smolvla_base) - Pre-trained VLA - [SmolVLA Blog](https://huggingface.co/blog/smolvla) - Introduction and tutorials - [SmolVLA Paper](https://huggingface.co/papers/2506.01844) - Technical details - [LeRobot Docs](https://huggingface.co/docs/lerobot) - Complete documentation - [Dataset Guide](https://huggingface.co/blog/lerobot-datasets) - Best practices - [Community Dataset v2](https://huggingface.co/datasets/HuggingFaceVLA/community_dataset_v2) - Previous Dataset - [Community Dataset v1](https://huggingface.co/datasets/HuggingFaceVLA/community_dataset_v1) - First release *Built with ❤️ by the LeRobot Community and SmolVLA Team*

提供机构：

HuggingFaceVLA

5,000+

优质数据集

54 个

任务类型

进入经典数据集