five

naive-bench

收藏
Hugging Face2026-07-01 更新2026-07-01 收录
下载链接:
https://huggingface.co/datasets/binhpham/naive-bench
下载链接
链接失效反馈
官方服务:
资源简介:
naive-bench — 箱子分类数据集是一个语言条件化的遥操作数据集,专注于使用单个SO-101机械臂进行箱子分类任务。该数据集为项目2: Naive Bench收集,这是一个面向爱好者的基准测试,用于在固定计算和数据约束下比较机器人操作策略(如ACT、VLAs等)。任务设计旨在区分VLAs与ACT/WAMs:箱子是任意命名的目标(没有颜色匹配规则),因此正确路由条形物体的唯一方法是读取提示中的两个槽位——条形颜色和箱子颜色。忽略指令的策略无法成功,这使得评估分数成为语言基础能力的直接度量。数据集包含312个episodes和115,378帧,使用LeRobotDataset v3.0格式。控制基于六个关节位置(顺序为:shoulder_pan.pos, shoulder_lift.pos, elbow_flex.pos, wrist_flex.pos, wrist_roll.pos, gripper.pos),摄像头包括臂上摄像头和头顶摄像头(240×320 RGB,AV1编码,30 fps)。训练中故意保留了四个(bar, bin)组合(蓝色条形→橙色箱子、红色条形→白色箱子、黄色条形→蓝色箱子、绿色条形→橙色箱子)作为held-out数据,以测试组合性而非记忆。其他20个组合出现在训练中。评估协议包括多个层级(T0到T4),每个层级有不同的场景和命令设置,评分基于不同结果给予信用分(例如,命名条形放入命名箱子得+1.0,放入错误箱子得+0.5等)。数据集通过LeRobotDataset库加载,收集过程涉及人类遥操作和HITL记录器。

naive-bench — the bin sorting dataset is a language-conditioned teleoperation dataset focused on bin sorting tasks using a single SO-101 robotic arm. This dataset was collected for Project 2: Naive Bench, a hobbyist-oriented benchmark for comparing robot manipulation strategies (such as ACT, VLAs, etc.) under fixed computation and data constraints. The task design aims to distinguish VLAs from ACT/WAMs: bins are arbitrarily named targets (with no color matching rules), so the only way to correctly route the bar objects is to read the two slots in the prompt—bar color and bin color. Strategies that ignore instructions cannot succeed, making the evaluation score a direct measure of language grounding capability. The dataset contains 312 episodes and 115,378 frames, using the LeRobotDataset v3.0 format. Control is based on six joint positions (in order: shoulder_pan.pos, shoulder_lift.pos, elbow_flex.pos, wrist_flex.pos, wrist_roll.pos, gripper.pos), and cameras include an arm-mounted camera and an overhead camera (240×320 RGB, AV1 encoding, 30 fps). Four (bar, bin) combinations (blue bar → orange bin, red bar → white bin, yellow bar → blue bin, green bar → orange bin) are intentionally held out during training as held-out data to test compositionality rather than memorization. The other 20 combinations appear in training. The evaluation protocol includes multiple levels (T0 to T4), each with different scenarios and command settings, and scoring awards credit points based on various outcomes (e.g., +1.0 for placing a named bar into a named bin, +0.5 for placing it into the wrong bin, etc.). The dataset is loaded via the LeRobotDataset library, and the collection process involves human teleoperation and HITL recording.
提供机构:
binhpham
创建时间:
2026-07-01
二维码
社区交流群
二维码
科研交流群
商业服务