EmoBench

Hugging Face2024-11-30 更新2024-12-12 收录

下载链接：

https://huggingface.co/datasets/SahandSab/EmoBench

下载链接

链接失效反馈

官方服务：

资源简介：

EmoBench是一个综合且具有挑战性的基准测试，旨在评估大型语言模型（LLMs）的情感智能（EI）。该数据集包含400个手工制作的英语和中文场景，分为两个关键评估任务：情感理解和情感应用。情感理解任务涉及识别复杂场景中的情感及其原因，而情感应用任务则涉及在情感困境中推荐有效的情感反应或行动。数据集基于心理学理论设计，支持双语场景，并包含需要推理和视角转换的复杂情感困境。数据集的高质量多标签注释通过严格的注释者间一致性验证（Fleiss' Kappa = 0.852）。

EmoBench is a comprehensive and challenging benchmark intended to evaluate the emotional intelligence (EI) of large language models (LLMs). This dataset contains 400 handcrafted English and Chinese scenarios, which are categorized into two core evaluation tasks: emotion understanding and emotion application. The emotion understanding task requires identifying emotions and their causes within complex scenarios, while the emotion application task focuses on recommending effective emotional responses or actions in emotional dilemmas. Designed based on psychological theories, the dataset supports bilingual scenarios and incorporates complex emotional dilemmas that necessitate reasoning and perspective-taking. Its high-quality multi-label annotations have been validated through strict inter-annotator agreement tests, with Fleiss' Kappa score reaching 0.852.

创建时间：

2024-11-25

原始信息汇总

EmoBench 数据集概述

概述

EmoBench 是一个综合且具有挑战性的基准测试，旨在评估大型语言模型（LLMs）的情感智能（EI）。该数据集不仅关注情感识别，还涵盖了情感推理和应用等高级情感智能能力。

数据集结构

数据集包含 400 个手工制作的场景，分为两个主要评估任务：

情感理解 (Emotional Understanding, EU): 识别复杂场景中的情感及其原因。
情感应用 (Emotional Application, EA): 在情感冲突的情境中推荐有效的情感反应或行动。

关键特性

基于心理学的设计: 基于情感智能的既定理论（如 Salovey & Mayer, Goleman）。
双语支持: 场景提供英文和中文版本。
挑战性场景: 包含需要推理和视角转换的复杂情感困境。
高质量标注: 通过严格的标注者间一致性验证（Fleiss Kappa = 0.852）。

数据集详细结构

情感理解 (Emotional Understanding)

类别: 复杂情感、情感线索、个人信念和经历、视角转换。
示例:
- 场景: 经过一天的糟糕事件后，Sam 的车坏了，他开始歇斯底里地大笑。
- 任务: 识别情感（如悲伤、喜悦）及其原因。

情感应用 (Emotional Application)

类别: 根据关系类型（个人、社交）、问题类型（自我、他人）和问题类型（反应、行动）划分。
示例:
- 场景: Rebecca 的儿子输掉了足球比赛，感到沮丧并责备自己。
- 任务: 识别最有效的反应或行动。

评估

有关评估代码，请访问 GitHub 仓库。

引用

如果该数据集对您的研究有用，请引用以下论文：

@inproceedings{sabour-etal-2024-emobench, title = "{E}mo{B}ench: Evaluating the Emotional Intelligence of Large Language Models", author = "Sabour, Sahand and Liu, Siyang and Zhang, Zheyuan and Liu, June and Zhou, Jinfeng and Sunaryo, Alvionna and Lee, Tatia and Mihalcea, Rada and Huang, Minlie", editor = "Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek", booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = aug, year = "2024", address = "Bangkok, Thailand", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.acl-long.326", doi = "10.18653/v1/2024.acl-long.326", pages = "5986--6004", abstract = "Recent advances in Large Language Models (LLMs) have highlighted the need for robust, comprehensive, and challenging benchmarks. Yet, research on evaluating their Emotional Intelligence (EI) is considerably limited. Existing benchmarks have two major shortcomings: first, they mainly focus on emotion recognition, neglecting essential EI capabilities such as emotion management and thought facilitation through emotion understanding; second, they are primarily constructed from existing datasets, which include frequent patterns, explicit information, and annotation errors, leading to unreliable evaluation. We propose EmoBench, a benchmark that draws upon established psychological theories and proposes a comprehensive definition for machine EI, including Emotional Understanding and Emotional Application. EmoBench includes a set of 400 hand-crafted questions in English and Chinese, which are meticulously designed to require thorough reasoning and understanding. Our findings reveal a considerable gap between the EI of existing LLMs and the average human, highlighting a promising direction for future research. Our code and data are publicly available at https://github.com/Sahandfer/EmoBench.", }

搜集汇总

数据集介绍

构建方式

EmoBench数据集的构建基于心理学理论，特别是Salovey & Mayer和Goleman的情感智能理论，精心设计了400个手工制作的情境，涵盖英语和中文两种语言。这些情境不仅涉及情感识别，还包括情感推理和应用，旨在全面评估大型语言模型（LLMs）的情感智能（EI）能力。数据集分为两个主要任务：情感理解（EU）和情感应用（EA），每个任务都经过严格的跨注释者一致性验证，确保了数据的高质量。

特点

EmoBench数据集的显著特点在于其心理学基础设计，确保了情境的复杂性和挑战性，能够有效评估LLMs在情感智能方面的深度理解与应用能力。此外，数据集支持双语（英语和中文），为跨语言研究提供了便利。情境设计注重情感的细微差别和视角转换，要求模型具备高级的情感推理能力。高质量的多标签注释通过严格的跨注释者一致性验证，确保了数据的可靠性和准确性。

使用方法

EmoBench数据集主要用于评估和提升大型语言模型在情感智能方面的表现，特别适用于情感理解和情感应用两个核心任务。研究者可以通过访问GitHub仓库获取评估代码，并使用该数据集进行模型训练和测试。数据集的结构清晰，包含详细的情境描述和任务要求，便于研究者进行实验设计和结果分析。引用该数据集时，请参考提供的文献信息，以确保学术研究的规范性和准确性。

背景与挑战

背景概述

EmoBench数据集由Sahand Sabour等人于2024年提出，旨在评估大型语言模型（LLMs）的情感智能（EI）。该数据集基于心理学的情感智能理论，特别是Salovey & Mayer和Goleman的理论，设计了400个手工制作的情境，涵盖英语和中文两种语言。EmoBench不仅关注情感识别，还涉及情感推理和应用，旨在填补现有数据集在情感智能评估方面的不足。该数据集的提出对情感智能领域的研究具有重要意义，为未来LLMs的情感智能提升提供了新的研究方向。

当前挑战

EmoBench数据集面临的挑战主要体现在两个方面。首先，情感智能的评估涉及复杂的情感推理和应用，要求模型具备高度的情感理解和视角转换能力，这在现有模型中仍存在显著差距。其次，数据集的构建过程中，设计400个手工制作的情境并进行高质量的多标签注释，确保了情境的复杂性和多样性，但也增加了数据集的构建难度。此外，跨语言的情感智能评估进一步增加了模型的复杂性和挑战性。

常用场景

经典使用场景

EmoBench数据集的经典使用场景主要集中在评估大型语言模型（LLMs）的情感智能（EI）能力。通过提供400个精心设计的情境，涵盖情感理解和情感应用两大任务，EmoBench能够有效测试模型在复杂情感场景中的情感识别、推理和应对能力。例如，在情感理解任务中，模型需要识别并解释情感及其成因；而在情感应用任务中，模型则需推荐有效的情感响应或行动策略。

实际应用

在实际应用中，EmoBench数据集可广泛应用于情感智能相关的多个领域。例如，在心理健康领域，模型可以通过分析用户的情感状态并提供适当的情感支持，帮助用户应对情感困扰；在教育领域，模型可以识别学生的情感需求，提供个性化的情感指导；在客户服务领域，模型能够更好地理解客户的情感需求，提供更人性化的服务。

衍生相关工作

EmoBench数据集的发布激发了大量相关研究工作。例如，有研究者基于EmoBench开发了新的情感智能评估框架，进一步细化了情感理解和情感应用的评估指标；还有研究团队利用EmoBench数据集训练和优化情感智能模型，提升了模型在复杂情感场景中的表现。此外，EmoBench的双语特性也促进了跨语言情感智能的研究，推动了多语言情感智能模型的发展。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集