LLM Performance in Strategic Randomization

arXiv2025-09-30 收录

下载链接：

https://github.com/ocelopus/llm-when-to-throw-coin

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集对五种具有不同架构和能力的大型语言模型（LLM）进行了评估，采用了一种涉及策略性随机化的轮盘式格式游戏。数据集中包含了在不同提示条件（框架化、中性、暗示）下的胜负矩阵和贝叶斯因子分析。这一评估在多个锦标赛中对五款LLM进行了多轮次的比较，旨在评估大型语言模型在战略推理和随机化能力方面的表现。

This dataset evaluates five large language models (LLMs) with distinct architectures and capabilities using a roulette-style formal game that incorporates strategic randomization. It contains win-loss matrices and Bayesian factor analyses under three different prompt conditions: framed, neutral, and suggestive. This evaluation conducts multi-round comparisons across multiple tournaments for the five LLMs, with the objective of assessing their performance in strategic reasoning and randomization capabilities.

5,000+

优质数据集

54 个

任务类型

进入经典数据集