five

Simulated Policy Debates

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/pradyuprasad/llms_overconfidence
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了60场模拟的政策辩论,这些辩论涉及10个最先进的LLM(大型语言模型)。在每一轮辩论结束后,这些模型都会评估自己在辩论中获胜的信心程度。此外,该数据集揭示了在零和辩论环境中,LLM之间存在显著的过度自信和信心升级的模式。规模上,共有10个LLM参与了60场辩论。任务内容涉及动态的、对抗性的辩论,并包括信心评估。

This dataset comprises 60 simulated policy debates involving 10 state-of-the-art Large Language Models (LLMs). Following each round of the debate, these models evaluate their self-assessed confidence of winning the debate. Moreover, this dataset uncovers significant patterns of overconfidence and confidence escalation among LLMs within zero-sum debate settings. In terms of scale, a total of 10 LLMs participated in all 60 debates. The associated tasks involve dynamic, adversarial debates and incorporate confidence assessment.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作