smol-libero
收藏魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/HuggingFaceVLA/smol-libero
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was created using [LeRobot](https://github.com/huggingface/lerobot).
# Dataset Card for **Smol-LIBERO**
## Dataset Summary
Smol-LIBERO is a **compact version of the LIBERO benchmark**, built to make experimentation fast and accessible.
At just **1.79 GB** (compared to ~34 GB for the full LIBERO), it contains fewer trajectories and cameras while keeping the same multimodal structure.
Each sample includes:
- **Images** from two fixed cameras
- **Two types of robot state** (end-effector pose + gripper, and full 7-DoF joint positions)
- **Actions** (7-DoF joint commands)
This setup is especially useful for comparing **low-dimensional state inputs** with **high-dimensional visual inputs**, or combining them in multimodal training.
---
## Dataset Structure
### Data Fields
- **`observation.images.image`**: 256×256×3 RGB image (camera 1)
- **`observation.images.image2`**: 256×256×3 RGB image (camera 2)
- **`observation.state`** *(8 floats)*: end-effector Cartesian pose + gripper
`[x, y, z, roll, pitch, yaw, gripper, gripper]`
- **`observation.state.joint`** *(7 floats)*: full joint angles
`[joint_1, …, joint_7]`
- **`action`** *(7 floats)*: target joint commands
---
## Why is it smaller than LIBERO?
- **Fewer trajectories/tasks** → subset of the full benchmark
- **Only two camera views** → reduced visual redundancy
- **Reduced total frames** → shorter episodes or lower FPS
That’s why Smol-LIBERO is **1.79 GB instead of 34 GB**.
---
## Intended Uses
- Quick prototyping and debugging
- Comparing joint-space vs. Cartesian state inputs
- Training small VLA baselines before scaling to LIBERO
---
## Limitations
- Smaller task and visual diversity compared to LIBERO
- Only two fixed camera views
- May not fully represent generalization behavior on larger benchmarks
## Citation
**BibTeX:**
```bibtex
[More Information Needed]
```
本数据集基于[LeRobot](https://github.com/huggingface/lerobot)构建。
# **Smol-LIBERO 数据集卡片**
## 数据集概述
Smol-LIBERO是**LIBERO基准测试集的精简版本**,旨在实现快速且易开展的实验。其存储空间仅为**1.79 GB**(完整LIBERO数据集约为34 GB),尽管轨迹数量与相机视角更少,但保留了原有的多模态数据结构。
每个数据样本包含以下内容:
- **两台固定视角相机采集的图像**
- **两类机器人状态数据**(末端执行器位姿与夹爪状态、完整7自由度(7-DoF)关节位置)
- **动作指令**(7自由度关节控制命令)
该数据结构尤其适用于对比**低维状态输入**与**高维视觉输入**,或在多模态训练中融合二者。
---
## 数据集结构
### 数据字段
- **`observation.images.image`**:256×256×3 分辨率的RGB图像(相机1采集)
- **`observation.images.image2`**:256×256×3 分辨率的RGB图像(相机2采集)
- **`observation.state`**(共8个浮点值):末端执行器笛卡尔位姿与夹爪状态,格式为`[x, y, z, 滚转, 俯仰, 偏航, 夹爪开度, 夹爪开度]`
- **`observation.state.joint`**(共7个浮点值):完整关节角度,格式为`[关节1, …, 关节7]`
- **`action`**(共7个浮点值):目标关节控制指令
---
## 为何Smol-LIBERO体积更小?
- **轨迹与任务数量更少**:仅选取完整基准测试集的子集
- **仅保留两个相机视角**:减少了视觉数据冗余
- **总帧数更少**:任务回合更短或帧率更低
这便是Smol-LIBERO仅需1.79 GB存储空间(而非34 GB)的原因。
---
## 预期用途
- 快速原型开发与调试
- 对比关节空间与笛卡尔空间的状态输入
- 在扩展至完整LIBERO数据集前,训练轻量化视觉语言动作(Visual Language Action, VLA)基准模型
---
## 局限性
- 与完整LIBERO数据集相比,任务与视觉多样性更有限
- 仅支持两个固定视角的相机
- 无法完全体现完整规模基准测试集上的泛化性能
## 引用
**BibTeX 格式:**
bibtex
[More Information Needed]
提供机构:
maas
创建时间:
2025-09-28



