GAMA(γ)-Bench

Name: GAMA(γ)-Bench
Creator: CUHK-ARISE
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/CUHK-ARISE/GAMABench

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个评估大型语言模型在多代理环境中的游戏能力的框架，涵盖了八个经典博弈论场景，并采用了一个动态评分方案。该数据集允许灵活的游戏设置，并根据不同的游戏参数调整评分系统。其规模包括来自六个模型家族的十二个大型语言模型的评估。该任务旨在评估大型语言模型在多代理游戏场景中的决策能力。

This dataset constitutes a framework for evaluating the game-playing capabilities of large language models (LLMs) in multi-agent environments. It includes eight classic game theory scenarios and employs a dynamic scoring scheme. The framework supports flexible game configuration and allows adjustment of the scoring system based on various game parameters. The dataset covers evaluations of twelve large language models across six model families. The core task of this evaluation framework is to assess the decision-making capabilities of large language models in multi-agent game scenarios.

提供机构：

CUHK-ARISE

5,000+

优质数据集

54 个

任务类型

进入经典数据集