BrainBench
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/braingpt-lovelab/BrainBench
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了评估人类参与者与语言模型在决策任务中协作效果的性能指标和LLM困惑度得分。此外,该数据集在研究中被用于分析人类与机器判断的互补性,重点关注信心校准和分类多样性。该数据集的任务是评估在决策任务中,人类与语言模型有效协作的条件。
This dataset includes performance metrics for assessing the collaborative effectiveness between human participants and large language models (LLMs) in decision-making tasks, as well as LLM perplexity scores. Additionally, this dataset was utilized in the study to analyze the complementarity between human and machine judgments, with a focus on confidence calibration and classification diversity. The task of this dataset is to evaluate the conditions under which humans and language models can collaborate effectively in decision-making tasks.
提供机构:
BrainGPT LoveLab



