five

BrainBench

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/braingpt-lovelab/BrainBench
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了评估人类参与者与语言模型在决策任务中协作效果的性能指标和LLM困惑度得分。此外,该数据集在研究中被用于分析人类与机器判断的互补性,重点关注信心校准和分类多样性。该数据集的任务是评估在决策任务中,人类与语言模型有效协作的条件。

This dataset includes performance metrics for assessing the collaborative effectiveness between human participants and large language models (LLMs) in decision-making tasks, as well as LLM perplexity scores. Additionally, this dataset was utilized in the study to analyze the complementarity between human and machine judgments, with a focus on confidence calibration and classification diversity. The task of this dataset is to evaluate the conditions under which humans and language models can collaborate effectively in decision-making tasks.
提供机构:
BrainGPT LoveLab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作