LLM Performance in Strategic Randomization
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/ocelopus/llm-when-to-throw-coin
下载链接
链接失效反馈官方服务:
资源简介:
该数据集对五种具有不同架构和能力的大型语言模型(LLM)进行了评估,采用了一种涉及策略性随机化的轮盘式格式游戏。数据集中包含了在不同提示条件(框架化、中性、暗示)下的胜负矩阵和贝叶斯因子分析。这一评估在多个锦标赛中对五款LLM进行了多轮次的比较,旨在评估大型语言模型在战略推理和随机化能力方面的表现。
This dataset evaluates five large language models (LLMs) with distinct architectures and capabilities using a roulette-style formal game that incorporates strategic randomization. It contains win-loss matrices and Bayesian factor analyses under three different prompt conditions: framed, neutral, and suggestive. This evaluation conducts multi-round comparisons across multiple tournaments for the five LLMs, with the objective of assessing their performance in strategic reasoning and randomization capabilities.



