RoVI Book
收藏arXiv2025-09-30 收录
下载链接:
https://robotic-visual-instruction.github.io/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为RoVI Book,包含了15,000个图像-文本问答对,这些对包括用RoVI标注的初始任务观察图像、简单的默认提示查询,以及由GPT-4o生成的回答,这些回答涵盖了RoVI分析、任务名称、细粒度的规划步骤和Python函数。该数据集覆盖了64%的单步任务和36%的多步任务,涉及五种基本操作技能:移动对象、旋转对象、拿起、打开抽屉/橱柜以及关闭抽屉/橱柜。此外,数据集还包括了增强的视觉变体,并保留了来自Open-X Embodiment数据集的原始语义任务名称。规模上,该数据集包含了15,000个实例,其任务是具有时空约束的机器人操作任务。
The dataset, named RoVI Book, consists of 15,000 image-text question-answer pairs. These pairs include initial task observation images annotated with RoVI, simple default prompt queries, and answers generated by GPT-4o, which cover RoVI analysis, task names, fine-grained planning steps, and Python functions. This dataset encompasses 64% of single-step tasks and 36% of multi-step tasks, involving five basic operational skills: moving objects, rotating objects, picking up, opening drawers/cabinets, and closing drawers/cabinets. Additionally, the dataset incorporates enhanced visual variants and retains the original semantic task names from the Open-X Embodiment dataset. With a total of 15,000 instances, this dataset focuses on robotic manipulation tasks with spatiotemporal constraints.



