EQ-Bench

Name: EQ-Bench
Creator: maas
Published: 2026-01-08 12:47:39
License: 暂无描述

魔搭社区2026-01-08 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/evalscope/EQ-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

# EQ-Bench This is the EQ-Bench v2 English dataset, all credit to Samuel J. Paech. --- Title: `EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models` Abstract: https://arxiv.org/abs/2312.06281 EQ-Bench is a benchmark for language models designed to assess emotional intelligence. Why emotional intelligence? One reason is that it represents a subset of abilities that are important for the user experience, and which isn't explicitly tested by other benchmarks. Another reason is that it's not trivial to improve scores by fine tuning for the benchmark, which makes it harder to "game" the leaderboard. EQ-Bench is a little different from traditional psychometric tests. It uses a specific question format, in which the subject has to read a dialogue then rate the intensity of possible emotional responses of one of the characters. Every question is interpretative and assesses the ability to predict the magnitude of the 4 presented emotions. The test is graded without the need for a judge (so there is no length bias). It's cheap to run (only 171 questions), and produces results that correlate strongly with human preference (Arena ELO) and multi-domain benchmarks like MMLU. Homepage: https://eqbench.com/ ### Citation ```bibtex @misc{paech2023eqbench, title={EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models}, author={Samuel J. Paech}, year={2023}, eprint={2312.06281}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

# EQ-Bench 本数据集为EQ-Bench v2英文版本，所有成果归属Samuel J. Paech。 --- ## 标题：EQ-Bench：面向大语言模型（Large Language Model）的情感智能基准测试 ## 摘要：详见论文：https://arxiv.org/abs/2312.06281 EQ-Bench是一款专为评估情感智能而设计的语言模型基准测试集。 ### 为何选取情感智能作为测试目标？其一，情感智能作为用户体验核心能力的子集，尚未被现有基准测试纳入明确考察范畴；其二，针对本基准测试进行微调以提升得分并非易事，有效降低了排行榜被刻意刷取的风险。 EQ-Bench与传统心理测量测试存在显著差异：它采用标准化问题格式，待测模型需先阅读一段对话文本，随后对其中某一角色可能产生的情绪反应强度进行评级。每道题目均为阐释性任务，用于评估模型对给定四种情绪的强度预判能力。该测试无需人工评委即可完成评分，因此不存在长度偏差问题。本测试仅包含171道题目，运行成本极低，且生成的测试结果与人类偏好评分（Arena ELO）及多领域基准测试（如MMLU）具有高度相关性。 ## 项目主页：https://eqbench.com/ ### 引用格式 bibtex @misc{paech2023eqbench, title={EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models}, author={Samuel J. Paech}, year={2023}, eprint={2312.06281}, archivePrefix={arXiv}, primaryClass={cs.CL} }

提供机构：

maas

创建时间：

2025-12-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集