five

declare-lab/KAIROS_EVAL

收藏
Hugging Face2025-08-31 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/declare-lab/KAIROS_EVAL
下载链接
链接失效反馈
官方服务:
资源简介:
KAIROS_EVAL是一个用于评估大型语言模型(LLM)在多智能体社交互动场景中鲁棒性的基准数据集。它通过捕捉模型的原始信念(答案+置信度)并模拟通过人工代理的同伴影响来动态构建每个模型的评估设置。该数据集支持多种任务,包括多选问答、鲁棒性评估、效用与抗力分析等。数据集分为推理、知识、常识和创造力四个领域,包含10,000个训练实例和3,000个测试实例。

KAIROS_EVAL is a benchmark dataset designed to evaluate the robustness of large language models (LLMs) in multi-agent socially interactive scenarios. It constructs evaluation settings for each model dynamically by capturing its original belief (answer + confidence) and simulating peer influence through artificial agents with varying reliability. The dataset supports various tasks such as multiple-choice QA, robustness evaluation, utility and resistance analysis, and includes domains of reasoning, knowledge, common sense, and creativity with a total of 10,000 training instances and 3,000 test instances.
提供机构:
declare-lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作