LightZero
收藏arXiv2023-10-12 更新2024-06-21 收录
下载链接:
https://github.com/opendilab/LightZero
下载链接
链接失效反馈官方服务:
资源简介:
LightZero是由商汤科技和上海人工智能实验室联合开发的首个统一基准,用于评估蒙特卡洛树搜索(MCTS)/MuZero算法在一般序列决策场景中的应用。该数据集包含9种算法和超过20个决策环境,通过详细的评估揭示了这些方法在构建可扩展和高效决策智能方面的潜力。LightZero的设计允许开发者专注于环境和算法的定制,同时通过一些技术如离策略校正和数据吞吐量限制器确保算法的稳定收敛和运行时加速。此外,LightZero还探索了将基于模型的强化学习与MCTS方法结合的优势,旨在解决状态表示学习和动力学学习之间的错位问题,从而加速收敛并提高数据效率。
LightZero is the first unified benchmark jointly developed by SenseTime and Shanghai AI Laboratory, designed to evaluate the application of Monte Carlo Tree Search (MCTS)/MuZero algorithms in general sequential decision-making scenarios. This dataset covers 9 algorithms and more than 20 decision-making environments, and reveals the potential of these methods in building scalable and efficient decision-making intelligence via comprehensive evaluations. The framework of LightZero enables developers to focus on customizing environments and algorithms, while ensuring stable convergence and runtime acceleration of algorithms through technologies such as off-policy correction and data throughput limiter. Furthermore, LightZero also explores the advantages of combining model-based reinforcement learning with MCTS methods, aiming to resolve the misalignment problem between state representation learning and dynamics learning, thus accelerating convergence and improving data efficiency.
提供机构:
商汤科技
创建时间:
2023-10-12



