BrainBench

Name: BrainBench
Creator: BrainGPT LoveLab
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/braingpt-lovelab/BrainBench

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了评估人类参与者与语言模型在决策任务中协作效果的性能指标和LLM困惑度得分。此外，该数据集在研究中被用于分析人类与机器判断的互补性，重点关注信心校准和分类多样性。该数据集的任务是评估在决策任务中，人类与语言模型有效协作的条件。

This dataset includes performance metrics for assessing the collaborative effectiveness between human participants and large language models (LLMs) in decision-making tasks, as well as LLM perplexity scores. Additionally, this dataset was utilized in the study to analyze the complementarity between human and machine judgments, with a focus on confidence calibration and classification diversity. The task of this dataset is to evaluate the conditions under which humans and language models can collaborate effectively in decision-making tasks.

提供机构：

BrainGPT LoveLab

5,000+

优质数据集

54 个

任务类型

进入经典数据集