so-arm-101-pouring-0.1

Name: so-arm-101-pouring-0.1
Creator: di-techinnova
Published: 2026-04-17 15:03:06
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-17 收录

下载链接：

https://huggingface.co/datasets/di-techinnova/so-arm-101-pouring-0.1

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含135个多任务机器人操作片段，专注于倒和抓取活动，使用SO-ARM-101机械臂（Waveshare/Koch Arm衍生产品）通过Leader-Follower（Master-Slave）设置收集。数据集旨在训练和评估视觉语言动作（VLA）模型（如SmolVLA或X-VLA），支持高精度视觉基础和长时动作序列。数据集结构遵循LeRobot v3.0格式，使用Parquet文件存储遥测数据，MP4文件存储视频流。数据集包含两个主要任务领域（种子和咖啡）的多个任务，如倒种子、倒咖啡、视觉基础和长时组合任务。数据集还详细描述了机器人类型、控制频率、视觉模态、环境、任务指令、特征、技术细节和动作空间。

This dataset contains 135 multi-task robotic manipulation segments focusing on pouring and grasping activities, collected using the SO-ARM-101 robotic arm (a derivative of the Waveshare/Koch Arm) via a Leader-Follower (Master-Slave) setup. It is designed for training and evaluating Vision-Language-Action (VLA) models such as SmolVLA or X-VLA, supporting high-precision visual grounding and long-horizon action sequences. The dataset structure follows the LeRobot v3.0 format, with Parquet files storing telemetry data and MP4 files storing video streams. It includes multiple tasks across two primary task domains (seeds and coffee), such as pouring seeds, pouring coffee, visual grounding, and long-horizon composite tasks. Additionally, the dataset provides detailed descriptions of the robot type, control frequency, visual modalities, environment, task instructions, features, technical details, and action space.

提供机构：

di-techinnova

创建时间：

2026-04-15

原始信息汇总

数据集概述

基本信息

数据集名称: SO-ARM-101 Pouring Seeds and Coffee Dataset for VLA Training
发布者: Data Impact VN - Technology Innovation Department
发布日期: 2026年
许可证: Apache-2.0
任务类别: 机器人学
相关标签: LeRobot
数据集地址: https://huggingface.co/datasets/di-techinnova/so-arm-101-pouring-0.1

数据集内容

总情节数: 135个
总帧数: 65,250帧
机器人类型: SO-ARM-101 (6自由度：5个关节 + 1个夹爪)
控制频率: 15 Hz
环境: 带有木纹的办公会议桌，使用高对比度的黄色背景。

视觉模态

camera1: 腕部安装摄像头，用于高精度操作和以物体为中心的视角。分辨率：1280x720。
camera2: 全局/门户视图，使用安卓手机摄像头获取场景上下文。分辨率：640x360。

任务指令

数据集涵盖两个主要领域（种子和咖啡）的不同任务：

倾倒种子: "将葵花籽从橙色杯子倒入透明杯子。"
倾倒咖啡（标准）: "将咖啡从橙色杯子倒入带有D贴纸的杯子。"
视觉定位（高对比度）: "将咖啡倒入带有黑色边框字母D的杯子。"
长时域组合: "倾倒咖啡，然后握住带有D标记的杯子。"（一个顺序任务，需要在动作之间有0.5秒的暂停）。

数据结构与特征

数据集遵循 LeRobot v3.0 格式，使用Parquet文件存储遥测数据，MP4文件存储视频流。

数据特征

特征	类型	描述
`action`	`float32[6]`	6个伺服电机（肩部平移、升降、肘部、腕部弯曲、滚动、夹爪）的目标位置。
`observation.state`	`float32[6]`	当前本体感知状态（关节位置，单位为度）。
`observation.images.camera1`	`video`	腕部摄像头视频流（1280x720 @ 15fps）。
`observation.images.camera2`	`video`	全局手机摄像头视频流（640x360 @ 15fps）。
`task_index`	`int64`	索引，映射到 `meta/tasks.parquet` 中的语言指令。

技术细节

视觉定位与挑战

透明度缓解: 为处理透明塑料杯的挑战，使用了"视觉锚点"，包括一个带有黑色边框字母"D"的白色贴纸。
空间多样性: 情节包含杯子放置和摄像头角度的变化，以防止对固定坐标的过拟合。
时间一致性: 数据收集时注意了15Hz的节奏，确保动作和图像在约66毫秒的窗口内同步。

动作空间

动作空间是连续的，代表伺服电机的绝对角度位置。夹爪值通常在20-40度之间表示牢固抓握，60度以上表示释放。

使用方式

python from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

加载数据集

dataset = LeRobotDataset("di-techinnova/so-arm-101-pouring-0.1")

访问第一个情节

frame = dataset[0] image = frame["observation.images.camera1"] state = frame["observation.state"] action = frame["action"] print(f"Instruction: {dataset.get_task(frame[task_index])}")

引用

如果您在研究中使用了此数据集，请按以下格式引用： bibtex @misc{di-techinnova/so-arm-101-pouring-0.1, author = {Data Impact VN - Technology Innovation Department}, title = {SO-ARM-101 Pouring Seeds and Coffee Dataset for VLA Training}, year = {2026}, publisher = {Hugging Face}, journal = {Hugging Face Hub}, howpublished = {url{https://huggingface.co/datasets/di-techinnova/so-arm-101-pouring-0.1}} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集