jeffnyman/emotions

Name: jeffnyman/emotions
Creator: jeffnyman
Published: 2023-07-29 18:10:20
License: 暂无描述

Hugging Face2023-07-29 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/jeffnyman/emotions

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: Emotions license: cc-by-sa-4.0 language: - en size_categories: - 10K<n<100K task_categories: - text-classification task_ids: - multi-class-classification tags: - emotion-classification dataset_info: - config_name: split features: - name: text dtype: string - name: label dtype: class_label: names: "0": sadness "1": joy "2": love "3": anger "4": fear "5": surprise splits: - name: train num_bytes: 1741597 num_examples: 16000 - name: validation num_bytes: 214703 num_examples: 2000 - name: test num_bytes: 217181 num_examples: 2000 download_size: 740883 dataset_size: 2173481 - config_name: unsplit features: - name: text dtype: string - name: label dtype: class_label: names: "0": sadness "1": joy "2": love "3": anger "4": fear "5": surprise splits: - name: train num_bytes: 45445685 num_examples: 416809 download_size: 15388281 dataset_size: 45445685 train-eval-index: - config: default task: text-classification task_id: multi_class_classification splits: train_split: train eval_split: test col_mapping: text: text label: target metrics: - type: accuracy name: Accuracy - type: f1 name: F1 macro args: average: macro - type: f1 name: F1 micro args: average: micro - type: f1 name: F1 weighted args: average: weighted - type: precision name: Precision macro args: average: macro - type: precision name: Precision micro args: average: micro - type: precision name: Precision weighted args: average: weighted - type: recall name: Recall macro args: average: macro - type: recall name: Recall micro args: average: micro - type: recall name: Recall weighted args: average: weighted --- # Dataset Card for "emotions" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Paper:** [CARER: Contextualized Affect Representations for Emotion Recognition](https://aclanthology.org/D18-1404/) - **Size of downloaded dataset files:** 16.13 MB - **Size of the generated dataset:** 47.62 MB - **Total amount of disk used:** 63.75 MB ### Dataset Summary Emotions is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise. For more detailed information please refer to the paper. Note that the paper does contain a larger data set with eight emotions being considered. ## Dataset Structure ### Data Instances An example bit of data looks like this: ``` { "text": "im feeling quite sad and sorry for myself but ill snap out of it soon", "label": 0 } ``` ### Data Fields The data fields are: - `text`: a `string` feature. - `label`: a classification label, with possible values including `sadness` (0), `joy` (1), `love` (2), `anger` (3), `fear` (4), `surprise` (5). ### Data Splits The dataset has two configurations. - split: with a total of 20,000 examples split into train, validation and test. - unsplit: with a total of 416,809 examples in a single train split. | name | train | validation | test | | ------- | -----: | ---------: | ---: | | split | 16000 | 2000 | 2000 | | unsplit | 416809 | n/a | n/a | ## Additional Information ### Licensing Information The dataset should be used for educational and research purposes only. It is licensed under Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). ### Citation Information If you use this dataset, please cite: ``` @inproceedings{saravia-etal-2018-carer, title = "{CARER}: Contextualized Affect Representations for Emotion Recognition", author = "Saravia, Elvis and Liu, Hsien-Chi Toby and Huang, Yen-Hao and Wu, Junlin and Chen, Yi-Shin", booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing", month = oct # "-" # nov, year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D18-1404", doi = "10.18653/v1/D18-1404", pages = "3687--3697", abstract = "Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.", } ```

pretty_name: 情绪（Emotions） license: CC BY-SA 4.0（知识共享署名-相同方式共享4.0国际许可协议） language: - 英语（en） size_categories: - 10K<n<100K task_categories: - 文本分类（text-classification） task_ids: - 多分类（multi-class-classification） tags: - 情绪分类（emotion-classification） dataset_info: - config_name: split（拆分版） features: - name: text dtype: 字符串（string） - name: label dtype: class_label: names: "0": 悲伤（sadness） "1": 喜悦（joy） "2": 爱（love） "3": 愤怒（anger） "4": 恐惧（fear） "5": 惊讶（surprise） splits: - name: train num_bytes: 1741597 num_examples: 16000 - name: validation num_bytes: 214703 num_examples: 2000 - name: test num_bytes: 217181 num_examples: 2000 download_size: 740883 dataset_size: 2173481 - config_name: unsplit（未拆分版） features: - name: text dtype: 字符串（string） - name: label dtype: class_label: names: "0": 悲伤（sadness） "1": 喜悦（joy） "2": 爱（love） "3": 愤怒（anger） "4": 恐惧（fear） "5": 惊讶（surprise） splits: - name: train num_bytes: 45445685 num_examples: 416809 download_size: 15388281 dataset_size: 45445685 train-eval-index: - config: default（默认配置） task: 文本分类（text-classification） task_id: 多分类（multi-class-classification） splits: train_split: train eval_split: test col_mapping: text: text label: target metrics: - type: accuracy name: 准确率（Accuracy） - type: f1 name: 宏平均F1值（F1 macro） args: average: macro - type: f1 name: 微平均F1值（F1 micro） args: average: micro - type: f1 name: 加权平均F1值（F1 weighted） args: average: weighted - type: precision name: 宏平均精确率（Precision macro） args: average: macro - type: precision name: 微平均精确率（Precision micro） args: average: micro - type: precision name: 加权平均精确率（Precision weighted） args: average: weighted - type: recall name: 宏平均召回率（Recall macro） args: average: macro - type: recall name: 微平均召回率（Recall micro） args: average: micro - type: recall name: 加权平均召回率（Recall weighted） args: average: weighted # 「情绪（Emotions）」数据集卡片 ## 目录 - [数据集描述](#dataset-description) - [数据集摘要](#dataset-summary) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据拆分](#data-splits) - [附加信息](#additional-information) - [许可信息](#licensing-information) - [引用信息](#citation-information) ## 数据集描述 - **论文**：[CARER：面向情绪识别的上下文情感表征（CARER: Contextualized Affect Representations for Emotion Recognition）](https://aclanthology.org/D18-1404/) - **下载数据集文件大小**：16.13 MB - **生成后数据集大小**：47.62 MB - **总磁盘占用量**：63.75 MB ### 数据集摘要情绪（Emotions）数据集由包含六种基本情绪的英语推特文本组成，分别为愤怒、恐惧、喜悦、爱、悲伤与惊讶。如需获取更详细信息，请参考原论文。请注意，原论文中包含一个包含八种情绪的更大规模数据集。 ## 数据集结构 ### 数据实例一段数据示例如下： { "text": "im feeling quite sad and sorry for myself but ill snap out of it soon", "label": 0 } ### 数据字段数据字段说明如下： - `text`：字符串类型特征，存储输入的文本内容。 - `label`：分类标签，其可选取值对应如下映射：`sadness`（悲伤，0）、`joy`（喜悦，1）、`love`（爱，2）、`anger`（愤怒，3）、`fear`（恐惧，4）、`surprise`（惊讶，5）。 ### 数据拆分该数据集包含两种配置： - split（拆分版）：总计20000条样本，划分为训练集、验证集与测试集。 - unsplit（未拆分版）：总计416809条样本，仅包含单个训练拆分。 | 配置名称 | 训练集样本数 | 验证集样本数 | 测试集样本数 | | ------- | -----: | ---------: | ---: | | split | 16000 | 2000 | 2000 | | unsplit | 416809 | 无 | 无 | ## 附加信息 ### 许可信息本数据集仅可用于教育与研究用途，采用知识共享署名-相同方式共享4.0国际许可协议（CC BY-SA 4.0）进行授权。 ### 引用信息若使用本数据集，请引用以下文献： @inproceedings{saravia-etal-2018-carer, title = "{CARER}: Contextualized Affect Representations for Emotion Recognition", author = "Saravia, Elvis and Liu, Hsien-Chi Toby and Huang, Yen-Hao and Wu, Junlin and Chen, Yi-Shin", booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing", month = "10月-11月", year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D18-1404", doi = "10.18653/v1/D18-1404", pages = "3687--3697", abstract = "情绪的表达具有细微差异，这会因群体或个体的经历、知识与信念而有所不同。因此，为了理解文本所传达的情绪，需要一种能够捕捉并建模不同语言细微差别与语言现象的鲁棒机制。我们提出了一种基于图的半监督算法，以生成丰富的结构描述符，作为从文本构建上下文情感表征的基础。该基于模式的表征进一步通过词嵌入进行增强，并通过多项情绪识别任务进行评估。实验结果表明，所提方法在情绪识别任务上的表现优于当前主流技术。", }

提供机构：

jeffnyman

原始信息汇总

数据集概述

数据集名称

名称: Emotions

许可信息

许可: cc-by-sa-4.0

语言

语言: 英语 (en)

大小分类

大小: 10K<n<100K

任务分类

任务: 文本分类
任务ID: 多类分类

数据集信息

配置名称: split 和 unsplit
特征:
- text: 字符串类型
- label: 分类标签，包括 sadness (0), joy (1), love (2), anger (3), fear (4), surprise (5)
分割:
- split:
  - train: 16000 条记录，1741597 字节
  - validation: 2000 条记录，214703 字节
  - test: 2000 条记录，217181 字节
- unsplit:
  - train: 416809 条记录，45445685 字节
下载大小:
- split: 740883 字节
- unsplit: 15388281 字节
数据集大小:
- split: 2173481 字节
- unsplit: 45445685 字节

训练与评估指标

配置: default
任务: 文本分类
任务ID: 多类分类
分割:
- 训练分割: train
- 评估分割: test
列映射:
- text: text
- label: target
指标:
- 准确率: Accuracy
- F1 宏: F1 macro
- F1 微: F1 micro
- F1 加权: F1 weighted
- 精确率宏: Precision macro
- 精确率微: Precision micro
- 精确率加权: Precision weighted
- 召回率宏: Recall macro
- 召回率微: Recall micro
- 召回率加权: Recall weighted

搜集汇总

数据集介绍

构建方式

该数据集名为Emotions，它是一个包含英文推文消息的数据集，旨在对六种基本情绪进行分类：愤怒、恐惧、喜悦、爱、悲伤和惊讶。数据集的构建基于Twitter平台上的文本信息，经过筛选和标注，将情绪状态与对应的文本内容相匹配，形成了具有标签的文本数据实例。在split配置中，数据被划分为训练集、验证集和测试集，共计20,000个示例；而在unsplit配置中，数据集包含416,809个示例，仅有一个训练集。

使用方法

使用该数据集时，用户可以根据具体的研究需求选择split或unsplit配置。对于split配置，可以直接利用训练集进行模型训练，验证集进行模型调整，测试集评估模型性能。而对于unsplit配置，用户需要自行划分数据集以适应不同的研究阶段。数据集的使用应遵守其授权协议，仅限于教育和研究目的，且在使用时需正确引用相关文献。

背景与挑战

背景概述

在自然语言处理领域，情绪识别是研究的热点之一，旨在理解和分类文本中表达的情绪状态。'jeffnyman/emotions' 数据集，创建于2018年，由Saravia等研究人员提出，并在论文《CARER: Contextualized Affect Representations for Emotion Recognition》中详细描述。该数据集包含英文Twitter消息，涵盖六种基本情绪：愤怒、恐惧、喜悦、爱、悲伤和惊讶。数据集的构建旨在为情绪识别任务提供基准数据，并推动该领域的研究进展。

当前挑战

该数据集在研究领域中面临的挑战主要表现在两个方面：一是情绪表达的复杂性和多样性使得分类任务具有挑战性；二是数据集构建过程中，如何有效处理和平衡不同情绪类别的样本分布，以及如何在保持数据真实性的同时确保隐私和版权问题得到妥善处理。

常用场景

经典使用场景

在自然语言处理领域，情感分析是理解人类情感表达的重要任务。jeffnyman/emotions数据集，作为英文推文情感分类的典范，广泛应用于机器学习模型的训练与评估。该数据集涵盖了六种基本情绪，为构建和优化情感分类模型提供了丰富的资源。

解决学术问题

该数据集解决了学术研究中对于情感分类标准不统一和标注数据不足的问题。通过提供大规模的标注数据，jeffnyman/emotions数据集使得研究者能够在统一的框架下评估不同模型的效果，推动了情感分析领域的发展。

实际应用

在实际应用中，jeffnyman/emotions数据集被用于社交媒体分析、用户情感趋势预测和情感驱动的交互式系统设计等场景。这些应用能够帮助企业更好地理解用户需求，提升用户体验。

数据集最近研究