Data from: A careful examination of large behavior models for multitask dexterous manipulation
收藏DataCite Commons2026-04-20 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.xd2547dxc
下载链接
链接失效反馈官方服务:
资源简介:
Robot manipulation has seen tremendous progress in recent years, with
imitation learning policies enabling successful performance of dexterous
and hard-to-model tasks. Concurrently, scaling data and model size has led
to the development of capable language and vision foundation models,
motivating large-scale efforts to create general-purpose robot foundation
models. While these models have garnered significant enthusiasm and
investment, meaningful evaluation of real-world performance remains a
challenge, limiting both the pace of development and inhibiting a nuanced
understanding of current capabilities. In this paper, we rigorously
evaluate multitask robot manipulation policies, referred to as Large
Behavior Models (LBMs), by extending the Diffusion Policy paradigm across
a corpus of simulated and real-world robot data. We propose and validate
an evaluation pipeline to rigorously analyze the capabilities of these
models with statistical confidence. We compare against single-task
baselines through blind, randomized trials in a controlled setting, using
both simulation and real-world experiments. We find that multitask
pretraining makes the policies more successful and robust, and enables
teaching.
近年来,机器人操作(robot manipulation)领域取得了长足进步,模仿学习(imitation learning)策略可实现灵巧且难以建模任务的成功执行。与此同时,数据与模型规模的拓展推动了高性能语言与视觉基础模型(foundation models)的发展,进而催生了打造通用机器人基础模型的大规模研究尝试。尽管这类模型收获了大量关注与投入,但对其真实世界性能的严谨评估仍是一项挑战,这既限制了研发进度,也阻碍了对当前模型能力的精细化理解。在本文中,我们通过将扩散策略(Diffusion Policy)范式扩展至模拟与真实机器人数据集合集,对被称为大行为模型(Large Behavior Models, LBMs)的多任务机器人操作策略开展了严格评估。我们提出并验证了一套评估流程,可借助统计置信度严谨分析这类模型的性能。我们在受控环境中通过盲法随机试验,结合仿真与真实世界实验,将其与单任务基准模型进行对比。我们发现,多任务预训练可提升策略的成功率与鲁棒性,并使其具备教学能力。
提供机构:
Dryad
创建时间:
2026-04-07



