Simulated Policy Debates

arXiv2025-09-30 收录

下载链接：

https://github.com/pradyuprasad/llms_overconfidence

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了60场模拟的政策辩论，这些辩论涉及10个最先进的LLM（大型语言模型）。在每一轮辩论结束后，这些模型都会评估自己在辩论中获胜的信心程度。此外，该数据集揭示了在零和辩论环境中，LLM之间存在显著的过度自信和信心升级的模式。规模上，共有10个LLM参与了60场辩论。任务内容涉及动态的、对抗性的辩论，并包括信心评估。

This dataset comprises 60 simulated policy debates involving 10 state-of-the-art Large Language Models (LLMs). Following each round of the debate, these models evaluate their self-assessed confidence of winning the debate. Moreover, this dataset uncovers significant patterns of overconfidence and confidence escalation among LLMs within zero-sum debate settings. In terms of scale, a total of 10 LLMs participated in all 60 debates. The associated tasks involve dynamic, adversarial debates and incorporate confidence assessment.

5,000+

优质数据集

54 个

任务类型

进入经典数据集