five

Data from: A careful examination of large behavior models for multitask dexterous manipulation

收藏
DataCite Commons2026-04-20 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.xd2547dxc
下载链接
链接失效反馈
官方服务:
资源简介:
Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. While these models have garnered significant enthusiasm and investment, meaningful evaluation of real-world performance remains a challenge, limiting both the pace of development and inhibiting a nuanced understanding of current capabilities. In this paper, we rigorously evaluate multitask robot manipulation policies, referred to as Large Behavior Models (LBMs), by extending the Diffusion Policy paradigm across a corpus of simulated and real-world robot data. We propose and validate an evaluation pipeline to rigorously analyze the capabilities of these models with statistical confidence. We compare against single-task baselines through blind, randomized trials in a controlled setting, using both simulation and real-world experiments. We find that multitask pretraining makes the policies more successful and robust, and enables teaching.

近年来,机器人操作(robot manipulation)领域取得了长足进步,模仿学习(imitation learning)策略可实现灵巧且难以建模任务的成功执行。与此同时,数据与模型规模的拓展推动了高性能语言与视觉基础模型(foundation models)的发展,进而催生了打造通用机器人基础模型的大规模研究尝试。尽管这类模型收获了大量关注与投入,但对其真实世界性能的严谨评估仍是一项挑战,这既限制了研发进度,也阻碍了对当前模型能力的精细化理解。在本文中,我们通过将扩散策略(Diffusion Policy)范式扩展至模拟与真实机器人数据集合集,对被称为大行为模型(Large Behavior Models, LBMs)的多任务机器人操作策略开展了严格评估。我们提出并验证了一套评估流程,可借助统计置信度严谨分析这类模型的性能。我们在受控环境中通过盲法随机试验,结合仿真与真实世界实验,将其与单任务基准模型进行对比。我们发现,多任务预训练可提升策略的成功率与鲁棒性,并使其具备教学能力。
提供机构:
Dryad
创建时间:
2026-04-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作