play-to-policy

Name: play-to-policy
Creator: OpenDataLab
Published: 2026-05-24 13:30:43
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/play_to_policy

下载链接

链接失效反馈

官方服务：

资源简介：

虽然离线数据的大规模序列建模在自然语言生成和图像生成方面取得了令人印象深刻的性能提升，但将这些想法直接转化为机器人技术一直具有挑战性。造成这种情况的一个关键原因是，从非专家人类演示者收集的未经策划的机器人演示数据（即游戏数据）通常是嘈杂的、多样化的和分布多模式的。这使得从此类数据中提取有用的、以任务为中心的行为成为一个困难的生成建模问题。在这项工作中，我们提出了条件行为变压器（C-BeT），这是一种将行为变压器的多模式生成能力与未来条件目标规范相结合的方法。在一系列模拟基准任务中，我们发现 C-BeT 在从游戏数据中学习方面比之前最先进的工作平均提高了 45.7%。此外，我们首次证明，可以在现实世界的机器人上纯粹从游戏数据中学习有用的以任务为中心的行为，而无需任何任务标签或奖励信息。

While large-scale sequence modeling leveraging offline data has delivered remarkable performance improvements across natural language generation and image generation, directly adapting these approaches to robotic systems has proven consistently challenging. A core contributing factor is that uncurated robotic demonstration data (i.e., gameplay data) collected from non-expert human demonstrators is typically noisy, diverse, and multi-modal. This renders extracting useful, task-centric behaviors from such datasets a difficult generative modeling problem. In this work, we propose Conditional Behavior Transformer (C-BeT), a method that integrates the multi-modal generative capabilities of Behavior Transformer with future-conditional goal specification. Across a suite of simulated benchmark tasks, we demonstrate that C-BeT achieves an average 45.7% performance improvement over prior state-of-the-art methods when learning from gameplay data. Furthermore, we present the first demonstration that useful task-centric behaviors can be learned purely from gameplay data on real-world robotic platforms, without requiring any task labels or reward signals.

提供机构：

OpenDataLab

创建时间：

2023-10-23

搜集汇总

数据集介绍