oneonlee/cleansed_emocontext

Name: oneonlee/cleansed_emocontext
Creator: oneonlee
Published: 2024-03-10 10:24:36
License: 暂无描述

Hugging Face2024-03-10 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/oneonlee/cleansed_emocontext

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - expert-generated language_creators: - crowdsourced license: mpl-2.0 task_categories: - text-classification task_ids: - sentiment-classification language: - en tags: - conversation size_categories: - 10K<n<100K source_datasets: - emo pretty_name: Cleansed_EmoContext dataset_info: features: - name: turn1 dtype: string - name: turn2 dtype: string - name: turn3 dtype: string - name: label dtype: class_label: names: "0": others "1": happy "2": sad "3": angry config_name: cleansed_emo2019 # splits: # - name: train # num_bytes: 2433205 # num_examples: 30160 # - name: test # num_bytes: 421555 # num_examples: 5509 # download_size: 3362556 # dataset_size: 2854760 --- # Dataset Card for "cleansed_emocontext" - `cleansed_emocontext` is a **cleansed and normalized version** of [`emo`](https://huggingface.co/datasets/emo). - For cleansing and normalization, [`data_cleansing.py`](https://github.com/oneonlee/cleansed_emocontext/blob/master/helpers/data_cleaning.py) was used, [modifying the code](https://github.com/oneonlee/cleansed_emocontext/commit/c09b020dfb49692a1c5fcd2099d531503d9bb8b5#diff-266912260148f110c4e7fe00b6cdef4c23b024dca8c693a0dd3c83f25ba56f54) provided on the [official EmoContext GitHub](https://github.com/DhruvDh/emocontext). ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [SemEval-2019 Task 3: EmoContext Contextual Emotion Detection in Text](https://aclanthology.org/S19-2005/) - **Repository:** [More Information Needed](https://github.com/DhruvDh/emocontext) - **Paper:** [SemEval-2019 Task 3: EmoContext Contextual Emotion Detection in Text](https://aclanthology.org/S19-2005/) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Size of downloaded dataset files:** 3.37 MB - **Size of the generated dataset:** 2.85 MB - **Total amount of disk used:** 6.22 MB ### Dataset Summary In this dataset, given a textual dialogue i.e. an utterance along with two previous turns of context, the goal was to infer the underlying emotion of the utterance by choosing from four emotion classes - Happy, Sad, Angry and Others. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Dataset Structure ### Data Instances #### cleansed_emo2019 An example of 'train' looks as follows. ``` { "label": 0, "turn1": "don't worry i'm girl", "turn2": "hmm how do i know if you are", "turn3": "what's your name ?" } ``` ### Data Fields The data fields are the same among all splits. #### cleansed_emo2019 - `turn1`, `turn2`, `turn3`: a `string` feature. - `label`: a classification label, with possible values including `others` (0), `happy` (1), `sad` (2), `angry` (3). ### Data Splits | name | train | dev | test | | ---------------- | ----: | ---: | ---: | | cleansed_emo2019 | 30160 | 2755 | 5509 | ## Dataset Creation ### Curation Rationale [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Source Data #### Initial Data Collection and Normalization [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the source language producers? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Annotations #### Annotation process [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the annotators? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Personal and Sensitive Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Discussion of Biases [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Other Known Limitations [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Additional Information ### Dataset Curators [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Licensing Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Citation Information ``` @inproceedings{chatterjee-etal-2019-semeval, title={SemEval-2019 Task 3: EmoContext Contextual Emotion Detection in Text}, author={Ankush Chatterjee and Kedhar Nath Narahari and Meghana Joshi and Puneet Agrawal}, booktitle={Proceedings of the 13th International Workshop on Semantic Evaluation}, year={2019}, address={Minneapolis, Minnesota, USA}, publisher={Association for Computational Linguistics}, url={https://www.aclweb.org/anthology/S19-2005}, doi={10.18653/v1/S19-2005}, pages={39--48}, abstract={In this paper, we present the SemEval-2019 Task 3 - EmoContext: Contextual Emotion Detection in Text. Lack of facial expressions and voice modulations make detecting emotions in text a challenging problem. For instance, as humans, on reading ''Why don't you ever text me!'' we can either interpret it as a sad or angry emotion and the same ambiguity exists for machines. However, the context of dialogue can prove helpful in detection of the emotion. In this task, given a textual dialogue i.e. an utterance along with two previous turns of context, the goal was to infer the underlying emotion of the utterance by choosing from four emotion classes - Happy, Sad, Angry and Others. To facilitate the participation in this task, textual dialogues from user interaction with a conversational agent were taken and annotated for emotion classes after several data processing steps. A training data set of 30160 dialogues, and two evaluation data sets, Test1 and Test2, containing 2755 and 5509 dialogues respectively were released to the participants. A total of 311 teams made submissions to this task. The final leader-board was evaluated on Test2 data set, and the highest ranked submission achieved 79.59 micro-averaged F1 score. Our analysis of systems submitted to the task indicate that Bi-directional LSTM was the most common choice of neural architecture used, and most of the systems had the best performance for the Sad emotion class, and the worst for the Happy emotion class} } ```

提供机构：

oneonlee

原始信息汇总

数据集概述

数据集名称: Cleansed_EmoContext
数据集别名: cleansed_emo2019
数据集大小: 10K<n<100K
语言: 英语 (en)
任务类别: 文本分类
具体任务: 情感分类
许可: MPL-2.0
标签创建者: 专家生成
语言创建者: 众包
源数据集: Emo
数据集特征:
- turn1, turn2, turn3: 字符串类型
- label: 分类标签，包括 "others" (0), "happy" (1), "sad" (2), "angry" (3)
数据分割:
- 训练集: 30160条数据
- 测试集: 5509条数据

数据集详情

数据集描述: 该数据集是一个清洗和标准化版本的Emo数据集，用于文本对话中的情感分类任务。每个实例包含一个文本对话及其前两轮的上下文，目标是推断出该对话的情感类别，包括快乐、悲伤、愤怒和其他。
数据集结构: 数据集包含三个文本字段（turn1, turn2, turn3）和一个标签字段（label），标签字段包含四个可能的情感类别。
数据集创建: 数据集的清洗和标准化过程使用了特定的Python脚本，该脚本基于官方EmoContext GitHub提供的代码进行了修改。

引用信息

@inproceedings{chatterjee-etal-2019-semeval, title={SemEval-2019 Task 3: EmoContext Contextual Emotion Detection in Text}, author={Ankush Chatterjee and Kedhar Nath Narahari and Meghana Joshi and Puneet Agrawal}, booktitle={Proceedings of the 13th International Workshop on Semantic Evaluation}, year={2019}, address={Minneapolis, Minnesota, USA}, publisher={Association for Computational Linguistics}, url={https://www.aclweb.org/anthology/S19-2005}, doi={10.18653/v1/S19-2005}, pages={39--48}, abstract={In this paper, we present the SemEval-2019 Task 3 - EmoContext: Contextual Emotion Detection in Text. Lack of facial expressions and voice modulations make detecting emotions in text a challenging problem. For instance, as humans, on reading Why dont you ever text me! we can either interpret it as a sad or angry emotion and the same ambiguity exists for machines. However, the context of dialogue can prove helpful in detection of the emotion. In this task, given a textual dialogue i.e. an utterance along with two previous turns of context, the goal was to infer the underlying emotion of the utterance by choosing from four emotion classes - Happy, Sad, Angry and Others. To facilitate the participation in this task, textual dialogues from user interaction with a conversational agent were taken and annotated for emotion classes after several data processing steps. A training data set of 30160 dialogues, and two evaluation data sets, Test1 and Test2, containing 2755 and 5509 dialogues respectively were released to the participants. A total of 311 teams made submissions to this task. The final leader-board was evaluated on Test2 data set, and the highest ranked submission achieved 79.59 micro-averaged F1 score. Our analysis of systems submitted to the task indicate that Bi-directional LSTM was the most common choice of neural architecture used, and most of the systems had the best performance for the Sad emotion class, and the worst for the Happy emotion class} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集