five

NYU ROT

收藏
OpenXLab2026-04-18 收录
下载链接:
https://openxlab.org.cn/datasets/OpenDataLab/NYU_ROT
下载链接
链接失效反馈
官方服务:
资源简介:
模仿学习在有效学习政策方面具有巨大的前景 复杂的决策问题。当前最先进的算法经常使用 逆向强化学习(IRL),其中给出了一组专家演示, 代理或者推断奖励函数和关联的最优策略。 但是,这种IRL方法通常需要大量的在线交互才能 复杂的控制问题。在这项工作中,我们提出了正则化最优传输 (ROT),一种新的模仿学习算法,建立在最优技术的最新进展之上 基于传输的轨迹匹配。我们的关键技术见解是自适应的 将轨迹匹配奖励与行为克隆相结合可以显着 即使只有几个演示,也能加速模仿。我们在 20 上的实验 跨DeepMind Control Suite,OpenAI机器人套件的视觉控制任务, 元世界基准测试显示模仿速度平均快 7.8× 与现有最先进的方法相比,达到专家绩效的 90%。 在现实世界的机器人操作中,只需一个演示和一个小时的 在线培训,ROT在90个任务中实现了1.14%的平均成功率。

Imitation learning holds great promise for efficiently learning policies for complex decision-making problems. Current state-of-the-art algorithms often employ inverse reinforcement learning (IRL), where given a set of expert demonstrations, agents infer the reward function and the associated optimal policy. However, such IRL methods typically require extensive online interactions to address complex control problems. In this work, we propose Regularized Optimal Transport (ROT), a novel imitation learning algorithm built upon the latest advances in optimal transport-based trajectory matching. Our key technical insight is that adaptively combining trajectory matching rewards with behavioral cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments conducted on 20 visual control tasks across the DeepMind Control Suite, OpenAI Robotic Suite, and the Meta-World benchmark demonstrate that ROT achieves 90% of expert performance with an average speedup of 7.8× compared to existing state-of-the-art methods. In real-world robotic manipulation, with only one demonstration and one hour of online training, ROT achieves an average success rate of 1.14% across 90 tasks.
提供机构:
OpenDataLab
创建时间:
2023-10-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作