ZeroSumEval Game Outcomes
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md
下载链接
链接失效反馈官方服务:
资源简介:
该数据集展示了不同规模的Llama 3模型在诸如国际象棋和辩论等竞技游戏中的直接对决结果,同时也揭示了模型大小与在这些游戏中表现性能之间的相关性。该数据集对多个模型进行了评估,其任务是通过竞争性游戏来对大型语言模型进行评估。
This dataset presents the direct head-to-head competition results of Llama 3 models of varying scales in competitive games such as chess and debate, while revealing the correlation between model scale and their performance in these games. This dataset evaluates multiple large language models, with competitive games serving as the evaluation task.
提供机构:
Authors of the paper



