five

SUPERSEDED - Official Data Repository for DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design

收藏
DataCite Commons2024-07-09 更新2024-07-13 收录
下载链接:
https://datashare.ed.ac.uk/handle/10283/8807
下载链接
链接失效反馈
官方服务:
资源简介:
## This item has been replaced by the one which can be found at [ https://doi.org/10.7488/ds/7769 ] ## Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when these environments share characteristics with the ones they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which assume control over level generation. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (DRED). DRED generates levels using a generative model trained to approximate the ground truth distribution of an initial set of level parameters. Through its grounding, DRED achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods. Our code and experimental data are available at: Garcin et al, "DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design", 2024, (https://github.com/uoe-agents/dred).

本数据集已由可在[https://doi.org/10.7488/ds/7769]获取的版本替代。采用深度强化学习(deep reinforcement learning, RL)训练的自主AI智能体(AI Agent),即便新环境与其训练阶段接触过的环境存在共性,往往也难以成功泛化至此类新环境。本研究围绕单个环境实例(亦称关卡,levels)的采样方式如何影响强化学习智能体的零样本泛化(zero-shot generalisation, ZSG)能力展开探究。我们发现,对于共享基础层的深度演员-评论家(actor-critic)架构而言,依据关卡的价值损失(value loss)对关卡进行优先级排序,可在生成的训练数据中最小化智能体内部表征与训练关卡集合之间的互信息,该发现为部分自适应采样策略所实现的正则化(regularisation)效应提供了全新的理论依据。随后我们将研究焦点转向无监督环境设计(unsupervised environment design, UED)方法,这类方法可实现对关卡生成过程的管控。我们发现,现有无监督环境设计方法会显著改变训练分布,进而导致零样本泛化性能低下。为同时避免过拟合与分布偏移,我们提出了数据正则化环境设计(data-regularised environment design, DRED)方法。DRED通过生成模型生成关卡:该生成模型经训练后可拟合初始关卡参数集合的真实分布(ground truth distribution)。得益于其对真实分布的锚定,DRED在零样本泛化性能上较自适应关卡采样策略与无监督环境设计方法均实现了显著提升。本研究的代码与实验数据可通过以下链接获取:Garcin等人,《DRED:基于数据正则化环境设计的强化学习零样本迁移》,2024年,https://github.com/uoe-agents/dred。
提供机构:
University of Edinburgh. College of Science & Engineering. School of Informatics. Autonomous Agents Research Group
创建时间:
2024-06-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作