SmartPlay
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/microsoft/SmartPlay
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为SmartPlay,包含了6款不同的游戏,其中包括石头剪刀布、汉诺塔和我的世界等,为评估大型语言模型(LLMs)作为智能代理的能力提供了一个基准。SmartPlay不仅允许对LLM代理的九项重要能力进行分析,还提供了一个严格的测试环境,以全面评估其整体性能。该数据集的规模涉及6款游戏,并提供了多达20种评估设置,其任务在于评估LLMs在各种游戏中的代理能力。
The dataset named SmartPlay includes six distinct games such as Rock-Paper-Scissors, Tower of Hanoi, and Minecraft, serving as a benchmark for evaluating the capabilities of large language models (LLMs) as AI agents. SmartPlay not only enables the analysis of nine critical capabilities of LLM-powered agents but also provides a rigorous test environment to comprehensively assess their overall performance. Comprising six games, this dataset offers up to 20 evaluation settings, with the goal of evaluating the agent capabilities of LLMs across various game scenarios.
提供机构:
Microsoft



