Offline Reinforcement Learning Based on Prioritized Sampling Model

Name: Offline Reinforcement Learning Based on Prioritized Sampling Model
Creator: Science Data Bank
Published: 2025-04-27 21:23:45
License: 暂无描述

DataCite Commons2025-04-27 更新2025-04-16 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=63c5778682f54f778af437f910c5f340

下载链接

链接失效反馈

官方服务：

资源简介：

Offline reinforcement learning algorithms realize the approximation of learned policy to behavioral policy by reducing the distribution shift, but the data distribution of offline experience buffer often directly affects the quality of learned policy. In this paper, two offline prioritized sampling models including temporal difference error-based and martingale-based are proposed to improve the training effect of reinforcement learning agent. The temporal difference error-based sampling model enables agents to learn more experience data with inaccurate value estimation, thus deals with possible out of distribution states by estimating more accurate value functions. The martingale-based sampling model enables agents to learn more positive samples beneficial to policy optimization and reduces the impact of negative samples on value function iteration. Furthermore, the proposed offline prioritized sampling models are combined with the batch-constrained deep Q-learning (BCQ) respectively, to propose temporal difference error-based prioritized BCQ and martingale-based prioritized BCQ. Experimental results on D4RL and Torcs datasets show that the proposed two offline prioritized sampling models can be targeted to select the experience data that are conducive to value function estimation or policy optimization, so as to obtain higher returns.

提供机构：

Science Data Bank

创建时间：

2023-04-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集