The Data of Offline Reinforcement Learning Based on Prioritized Sampling Model

Name: The Data of Offline Reinforcement Learning Based on Prioritized Sampling Model
Creator: China University of Mining and Technology
Published: 2024-01-05 00:00:00
License: 暂无描述

科学数据银行2024-01-05 更新2026-04-23 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=8dd7789e5ed745ec90514682531464d8

下载链接

链接失效反馈

官方服务：

资源简介：

Offline reinforcement learning algorithms realize the approximation of learned policy to behavior policy by reducing the distribution shift, but the data distribution of offline experience buffer often directly affects the quality of learned policy. In this paper, two offline prioritized sampling models including temporal difference error-based and martingale-based are proposed to improve the training effect of reinforcement learning agent. The temporal difference error-based sampling model enables agents to learn more experience data with inaccurate value estimation, thus deals with possible out of distribution states by estimating more accurate value functions. The martingale-based sampling model enables agents to learn more positive samples beneficial to policy optimization and reduces the impact of negative samples on value function iteration. Furthermore, the proposed offline prioritized sampling models are combined with the batch-constrained deep Q-learning (BCQ) respectively, to propose temporal difference error-based prioritized BCQ and martingale-based prioritized BCQ. Experimental results on D4RL and Torcs datasets show that the proposed two offline prioritized sampling models can be targeted to select the experience data that are conducive to value function estimation or policy optimization, so as to obtain higher rewards.

提供机构：

China University of Mining and Technology

创建时间：

2024-01-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集