MuJoCo locomotion benchmark tasks
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/Stilwell-Git/Randomized-Return-Decomposition
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一套专为评估情景强化学习算法而设计的基准任务集。它包含了一系列任务,在这些任务中,智能体在非终止状态下接收不到任何信号,只有在轨迹结束时才会获得情景反馈。这些任务具有长视野,最大轨迹长度达到1000步。该数据集的任务是研究在稀疏且延迟奖励条件下的情景强化学习。
This dataset is a benchmark task suite specifically designed for evaluating episodic reinforcement learning algorithms. It includes a series of tasks where the agent receives no signals during non-terminal states, and only obtains episodic feedback at the end of trajectories. These tasks feature long horizons, with the maximum trajectory length reaching 1000 steps. The tasks in this dataset target the study of episodic reinforcement learning under sparse and delayed reward conditions.
提供机构:
OpenAI Gym



