Randomly Generated MDPs
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/indujohniisc/GSQL
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了100个随机生成的马尔可夫决策过程(MDPs),每个MDP具有10个状态和5个动作,且对于所有的状态-动作对(i,a),都满足概率P(i|i,a)大于0的条件,同时奖励是有限的。此外,该数据集被用于比较GSQL1和GSQL2算法与Q学习、快速Q学习和双重Q学习算法的性能。数据集的规模覆盖了具有不同状态空间基数(10、50、100、500、1000)的100个MDPs。该数据集的任务是对强化学习算法进行评估。
This dataset contains 100 randomly generated Markov Decision Processes (MDPs). Each MDP has 10 states and 5 actions, and for all state-action pairs (i,a), the transition probability P(i|i,a) is greater than 0, with bounded rewards. Additionally, this dataset is utilized to compare the performance of GSQL1, GSQL2 algorithms against Q-learning, Fast Q-learning, and Double Q-learning algorithms. The dataset covers 100 MDPs with different state space cardinalities: 10, 50, 100, 500, and 1000. The task of this dataset is to evaluate reinforcement learning algorithms.
提供机构:
Authors of the paper



