MBZUAI/SocialMaze
收藏Hugging Face2025-05-14 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/MBZUAI/SocialMaze
下载链接
链接失效反馈官方服务:
资源简介:
SocialMaze数据集是一个用于评估大型语言模型中的社会推理能力的基准测试的一部分。它特别包含了隐藏角色推断任务,这个任务被视作测试大型语言模型在处理复杂社会推理、欺诈处理和推理能力方面的最具挑战性的场景之一。数据集以问题回答(QA)的结构进行整理和格式化,以便直接评估LLMs。每个实例代表一个独特的游戏场景,以QA格式呈现。数据集分为两个难度配置:简单分割和困难分割。简单分割包含6名玩家,困难分割包含10名玩家。数据集可用于直接评估模型,并通过提供的推理过程字段进行错误分析、开发链式思维策略或作为微调模型以改善其推理能力的训练数据。
The SocialMaze dataset is a component of the benchmark for evaluating social reasoning in large language models. It features the Hidden Role Deduction task, which is considered one of the most challenging scenarios for testing complex social reasoning, deception handling, and inferential capabilities in Large Language Models (LLMs). The dataset is curated and formatted into a question-answering (QA) structure for direct evaluation of LLMs. Each instance in the dataset represents a unique game scenario presented in a QA format. The dataset is divided into two configurations based on difficulty: easy split with 6 players and hard split with 10 players. The dataset can be used for direct model evaluation, error analysis, developing step-by-step prompting strategies, and as training data for fine-tuning models to improve their deductive social reasoning capabilities.
提供机构:
MBZUAI



