SemEvalWorkshop/emo

Name: SemEvalWorkshop/emo
Creator: SemEvalWorkshop
Published: 2024-08-25 08:08:02
License: 暂无描述

Hugging Face2024-08-25 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/SemEvalWorkshop/emo

下载链接

链接失效反馈

官方服务：

资源简介：

在这个数据集中，给定一个文本对话，即一个话语及其前两个对话轮次的上下文，目标是通过从四个情感类别（Happy、Sad、Angry和Others）中选择来推断话语的潜在情感。

In this dataset, given a textual dialogue comprising an utterance and the context formed by its first two preceding dialogue turns, the task is to infer the underlying emotion of the utterance by selecting from four emotion categories: Happy, Sad, Angry, and Others.

提供机构：

SemEvalWorkshop

原始信息汇总

数据集概述

数据集描述

数据集摘要

该数据集包含文本对话，即一个话语及其前两个上下文回合，目标是推断话语的潜在情绪，从四个情绪类别中选择：快乐、悲伤、愤怒和其他。

支持的任务和排行榜

该数据集支持的任务是情感分类。

语言

数据集使用的语言是英语。

数据集结构

数据实例

一个训练集的示例如下： json { "label": 0, "text": "dont worry im girl hmm how do i know if you are whats ur name" }

数据字段

数据字段在所有拆分中都是相同的。

emo2019

text: 一个字符串特征。
label: 一个分类标签，可能的值包括 others (0), happy (1), sad (2), angry (3)。

数据拆分

name	train	test
emo2019	30160	5509

数据集创建

数据集创建理由

该数据集的创建旨在促进文本中情绪检测的研究。

源数据

源数据来自用户与对话代理的交互。

标注

标注由专家生成。

个人和敏感信息

数据集中不包含个人和敏感信息。

使用数据的注意事项

数据集的社会影响

该数据集可能对情绪分析和对话系统的发展产生积极影响。

偏见讨论

数据集可能存在情绪类别分布不均等偏见。

其他已知限制

数据集的许可证未知。

附加信息

数据集策展人

数据集由专家和众包方式创建。

许可信息

数据集的许可证未知。

引用信息

bibtex @inproceedings{chatterjee-etal-2019-semeval, title={SemEval-2019 Task 3: EmoContext Contextual Emotion Detection in Text}, author={Ankush Chatterjee and Kedhar Nath Narahari and Meghana Joshi and Puneet Agrawal}, booktitle={Proceedings of the 13th International Workshop on Semantic Evaluation}, year={2019}, address={Minneapolis, Minnesota, USA}, publisher={Association for Computational Linguistics}, url={https://www.aclweb.org/anthology/S19-2005}, doi={10.18653/v1/S19-2005}, pages={39--48}, abstract={In this paper, we present the SemEval-2019 Task 3 - EmoContext: Contextual Emotion Detection in Text. Lack of facial expressions and voice modulations make detecting emotions in text a challenging problem. For instance, as humans, on reading Why dont you ever text me! we can either interpret it as a sad or angry emotion and the same ambiguity exists for machines. However, the context of dialogue can prove helpful in detection of the emotion. In this task, given a textual dialogue i.e. an utterance along with two previous turns of context, the goal was to infer the underlying emotion of the utterance by choosing from four emotion classes - Happy, Sad, Angry and Others. To facilitate the participation in this task, textual dialogues from user interaction with a conversational agent were taken and annotated for emotion classes after several data processing steps. A training data set of 30160 dialogues, and two evaluation data sets, Test1 and Test2, containing 2755 and 5509 dialogues respectively were released to the participants. A total of 311 teams made submissions to this task. The final leader-board was evaluated on Test2 data set, and the highest ranked submission achieved 79.59 micro-averaged F1 score. Our analysis of systems submitted to the task indicate that Bi-directional LSTM was the most common choice of neural architecture used, and most of the systems had the best performance for the Sad emotion class, and the worst for the Happy emotion class} }

贡献

感谢 @thomwolf, @lordtt13, @lhoestq 添加此数据集。

搜集汇总

数据集介绍

构建方式

该数据集的构建基于对对话上下文中情感倾向的识别需求，通过采集用户与对话系统的交互文本，经过专家生成和众包方式标注，形成了包含四种情感类别（快乐、悲伤、愤怒、其他）的文本数据集。数据集包含了训练集和测试集，总计3.37 MB的下载大小和2.85 MB的生成数据大小，为情感分类任务提供了丰富的资源。

特点

EmoContext数据集的特点在于其专注于上下文情感检测，提供了包含前两轮对话上下文的文本语句，以及对应的情感标签。数据集采用单语种英文构建，具有10K至100K的规模，并且标注由专家和众包共同完成，保证了标注的质量和多样性。此外，数据集还提供了详细的元数据信息，包括数据字段的定义和数据分割的情况。

使用方法

使用该数据集时，用户可以根据自己的需要选择训练集或测试集。数据以JSON格式存储，每个实例包含文本内容和对应的情感标签。用户可以借助HuggingFace提供的工具库轻松加载和预处理数据，进而应用于情感分类模型的训练和评估。数据集的使用不受特定语言环境的限制，便于在多种应用场景中进行情感分析研究。

背景与挑战

背景概述

EmoContext数据集，起源于2019年SemEval研讨会，由Ankush Chatterjee等研究人员提出，旨在解决文本中的情感分类问题。该数据集通过收集对话文本，并对其中的情感进行标注，以帮助机器学习模型更好地理解和预测文本中的情绪。数据集包含了四种情感类别：快乐、悲伤、愤怒和其他。EmoContext数据集的创建，为自然语言处理领域中的情感分析任务提供了有力的支持，特别是在理解上下文对话中的情感细微差别方面，对相关领域的研究产生了重要影响。

当前挑战

该数据集在构建过程中遇到的挑战主要包括：一是确保标注的质量和一致性，因为情感标注具有主观性；二是数据集中涉及的个人和敏感信息处理，需保证隐私和合规性。在研究领域问题方面，EmoContext数据集面临的挑战是如何提高模型在上下文情感理解方面的准确性，特别是在处理情感类别之间的模糊性和细微差别时。此外，如何确保模型在不同文化和语言背景下的泛化能力也是一个重要的挑战。

常用场景

经典使用场景

在文本分类领域，SemEvalWorkshop/emo数据集的经典使用场景是针对对话文本中的情感倾向进行分类，旨在通过分析对话中的上下文信息，准确判断发言者所表达的情感。具体而言，该数据集提供了包含前两轮对话上下文的单条发言，并要求模型从快乐、悲伤、愤怒和其他四个类别中识别发言者的情感。

实际应用

在实际应用中，SemEvalWorkshop/emo数据集可以被用于增强聊天机器人的情感理解能力，提升客户服务系统的响应质量，以及优化个性化推荐系统中的用户情感分析模型。通过准确识别用户情感，企业能够提供更加贴合用户需求的交互体验，从而增强用户满意度和忠诚度。

衍生相关工作

基于SemEvalWorkshop/emo数据集，研究者们衍生出了一系列相关工作，包括但不限于开发新的情感分类算法、构建基于深度学习的情感识别模型、以及针对不同语言和文化背景的情感分析研究。这些工作不仅推动了情感计算领域的发展，也为跨学科的情感研究提供了丰富的数据资源和研究范例。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集