jeffnyman/emotions
收藏Hugging Face2023-07-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jeffnyman/emotions
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Emotions
license: cc-by-sa-4.0
language:
- en
size_categories:
- 10K<n<100K
task_categories:
- text-classification
task_ids:
- multi-class-classification
tags:
- emotion-classification
dataset_info:
- config_name: split
features:
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
"0": sadness
"1": joy
"2": love
"3": anger
"4": fear
"5": surprise
splits:
- name: train
num_bytes: 1741597
num_examples: 16000
- name: validation
num_bytes: 214703
num_examples: 2000
- name: test
num_bytes: 217181
num_examples: 2000
download_size: 740883
dataset_size: 2173481
- config_name: unsplit
features:
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
"0": sadness
"1": joy
"2": love
"3": anger
"4": fear
"5": surprise
splits:
- name: train
num_bytes: 45445685
num_examples: 416809
download_size: 15388281
dataset_size: 45445685
train-eval-index:
- config: default
task: text-classification
task_id: multi_class_classification
splits:
train_split: train
eval_split: test
col_mapping:
text: text
label: target
metrics:
- type: accuracy
name: Accuracy
- type: f1
name: F1 macro
args:
average: macro
- type: f1
name: F1 micro
args:
average: micro
- type: f1
name: F1 weighted
args:
average: weighted
- type: precision
name: Precision macro
args:
average: macro
- type: precision
name: Precision micro
args:
average: micro
- type: precision
name: Precision weighted
args:
average: weighted
- type: recall
name: Recall macro
args:
average: macro
- type: recall
name: Recall micro
args:
average: micro
- type: recall
name: Recall weighted
args:
average: weighted
---
# Dataset Card for "emotions"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Additional Information](#additional-information)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
## Dataset Description
- **Paper:** [CARER: Contextualized Affect Representations for Emotion Recognition](https://aclanthology.org/D18-1404/)
- **Size of downloaded dataset files:** 16.13 MB
- **Size of the generated dataset:** 47.62 MB
- **Total amount of disk used:** 63.75 MB
### Dataset Summary
Emotions is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise. For more detailed information please refer to the paper. Note that the paper does contain a larger data set with eight emotions being considered.
## Dataset Structure
### Data Instances
An example bit of data looks like this:
```
{
"text": "im feeling quite sad and sorry for myself but ill snap out of it soon",
"label": 0
}
```
### Data Fields
The data fields are:
- `text`: a `string` feature.
- `label`: a classification label, with possible values including `sadness` (0), `joy` (1), `love` (2), `anger` (3), `fear` (4), `surprise` (5).
### Data Splits
The dataset has two configurations.
- split: with a total of 20,000 examples split into train, validation and test.
- unsplit: with a total of 416,809 examples in a single train split.
| name | train | validation | test |
| ------- | -----: | ---------: | ---: |
| split | 16000 | 2000 | 2000 |
| unsplit | 416809 | n/a | n/a |
## Additional Information
### Licensing Information
The dataset should be used for educational and research purposes only. It is licensed under Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
### Citation Information
If you use this dataset, please cite:
```
@inproceedings{saravia-etal-2018-carer,
title = "{CARER}: Contextualized Affect Representations for Emotion Recognition",
author = "Saravia, Elvis and
Liu, Hsien-Chi Toby and
Huang, Yen-Hao and
Wu, Junlin and
Chen, Yi-Shin",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
month = oct # "-" # nov,
year = "2018",
address = "Brussels, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D18-1404",
doi = "10.18653/v1/D18-1404",
pages = "3687--3697",
abstract = "Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.",
}
```
pretty_name: 情绪(Emotions)
license: CC BY-SA 4.0(知识共享署名-相同方式共享4.0国际许可协议)
language:
- 英语(en)
size_categories:
- 10K<n<100K
task_categories:
- 文本分类(text-classification)
task_ids:
- 多分类(multi-class-classification)
tags:
- 情绪分类(emotion-classification)
dataset_info:
- config_name: split(拆分版)
features:
- name: text
dtype: 字符串(string)
- name: label
dtype:
class_label:
names:
"0": 悲伤(sadness)
"1": 喜悦(joy)
"2": 爱(love)
"3": 愤怒(anger)
"4": 恐惧(fear)
"5": 惊讶(surprise)
splits:
- name: train
num_bytes: 1741597
num_examples: 16000
- name: validation
num_bytes: 214703
num_examples: 2000
- name: test
num_bytes: 217181
num_examples: 2000
download_size: 740883
dataset_size: 2173481
- config_name: unsplit(未拆分版)
features:
- name: text
dtype: 字符串(string)
- name: label
dtype:
class_label:
names:
"0": 悲伤(sadness)
"1": 喜悦(joy)
"2": 爱(love)
"3": 愤怒(anger)
"4": 恐惧(fear)
"5": 惊讶(surprise)
splits:
- name: train
num_bytes: 45445685
num_examples: 416809
download_size: 15388281
dataset_size: 45445685
train-eval-index:
- config: default(默认配置)
task: 文本分类(text-classification)
task_id: 多分类(multi-class-classification)
splits:
train_split: train
eval_split: test
col_mapping:
text: text
label: target
metrics:
- type: accuracy
name: 准确率(Accuracy)
- type: f1
name: 宏平均F1值(F1 macro)
args:
average: macro
- type: f1
name: 微平均F1值(F1 micro)
args:
average: micro
- type: f1
name: 加权平均F1值(F1 weighted)
args:
average: weighted
- type: precision
name: 宏平均精确率(Precision macro)
args:
average: macro
- type: precision
name: 微平均精确率(Precision micro)
args:
average: micro
- type: precision
name: 加权平均精确率(Precision weighted)
args:
average: weighted
- type: recall
name: 宏平均召回率(Recall macro)
args:
average: macro
- type: recall
name: 微平均召回率(Recall micro)
args:
average: micro
- type: recall
name: 加权平均召回率(Recall weighted)
args:
average: weighted
# 「情绪(Emotions)」数据集卡片
## 目录
- [数据集描述](#dataset-description)
- [数据集摘要](#dataset-summary)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据拆分](#data-splits)
- [附加信息](#additional-information)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
## 数据集描述
- **论文**:[CARER:面向情绪识别的上下文情感表征(CARER: Contextualized Affect Representations for Emotion Recognition)](https://aclanthology.org/D18-1404/)
- **下载数据集文件大小**:16.13 MB
- **生成后数据集大小**:47.62 MB
- **总磁盘占用量**:63.75 MB
### 数据集摘要
情绪(Emotions)数据集由包含六种基本情绪的英语推特文本组成,分别为愤怒、恐惧、喜悦、爱、悲伤与惊讶。如需获取更详细信息,请参考原论文。请注意,原论文中包含一个包含八种情绪的更大规模数据集。
## 数据集结构
### 数据实例
一段数据示例如下:
{
"text": "im feeling quite sad and sorry for myself but ill snap out of it soon",
"label": 0
}
### 数据字段
数据字段说明如下:
- `text`:字符串类型特征,存储输入的文本内容。
- `label`:分类标签,其可选取值对应如下映射:`sadness`(悲伤,0)、`joy`(喜悦,1)、`love`(爱,2)、`anger`(愤怒,3)、`fear`(恐惧,4)、`surprise`(惊讶,5)。
### 数据拆分
该数据集包含两种配置:
- split(拆分版):总计20000条样本,划分为训练集、验证集与测试集。
- unsplit(未拆分版):总计416809条样本,仅包含单个训练拆分。
| 配置名称 | 训练集样本数 | 验证集样本数 | 测试集样本数 |
| ------- | -----: | ---------: | ---: |
| split | 16000 | 2000 | 2000 |
| unsplit | 416809 | 无 | 无 |
## 附加信息
### 许可信息
本数据集仅可用于教育与研究用途,采用知识共享署名-相同方式共享4.0国际许可协议(CC BY-SA 4.0)进行授权。
### 引用信息
若使用本数据集,请引用以下文献:
@inproceedings{saravia-etal-2018-carer,
title = "{CARER}: Contextualized Affect Representations for Emotion Recognition",
author = "Saravia, Elvis and
Liu, Hsien-Chi Toby and
Huang, Yen-Hao and
Wu, Junlin and
Chen, Yi-Shin",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
month = "10月-11月",
year = "2018",
address = "Brussels, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D18-1404",
doi = "10.18653/v1/D18-1404",
pages = "3687--3697",
abstract = "情绪的表达具有细微差异,这会因群体或个体的经历、知识与信念而有所不同。因此,为了理解文本所传达的情绪,需要一种能够捕捉并建模不同语言细微差别与语言现象的鲁棒机制。我们提出了一种基于图的半监督算法,以生成丰富的结构描述符,作为从文本构建上下文情感表征的基础。该基于模式的表征进一步通过词嵌入进行增强,并通过多项情绪识别任务进行评估。实验结果表明,所提方法在情绪识别任务上的表现优于当前主流技术。",
}
提供机构:
jeffnyman
原始信息汇总
数据集概述
数据集名称
- 名称: Emotions
许可信息
- 许可: cc-by-sa-4.0
语言
- 语言: 英语 (en)
大小分类
- 大小: 10K<n<100K
任务分类
- 任务: 文本分类
- 任务ID: 多类分类
标签
- 标签: 情绪分类
数据集信息
- 配置名称: split 和 unsplit
- 特征:
- text: 字符串类型
- label: 分类标签,包括 sadness (0), joy (1), love (2), anger (3), fear (4), surprise (5)
- 分割:
- split:
- train: 16000 条记录,1741597 字节
- validation: 2000 条记录,214703 字节
- test: 2000 条记录,217181 字节
- unsplit:
- train: 416809 条记录,45445685 字节
- split:
- 下载大小:
- split: 740883 字节
- unsplit: 15388281 字节
- 数据集大小:
- split: 2173481 字节
- unsplit: 45445685 字节
训练与评估指标
- 配置: default
- 任务: 文本分类
- 任务ID: 多类分类
- 分割:
- 训练分割: train
- 评估分割: test
- 列映射:
- text: text
- label: target
- 指标:
- 准确率: Accuracy
- F1 宏: F1 macro
- F1 微: F1 micro
- F1 加权: F1 weighted
- 精确率 宏: Precision macro
- 精确率 微: Precision micro
- 精确率 加权: Precision weighted
- 召回率 宏: Recall macro
- 召回率 微: Recall micro
- 召回率 加权: Recall weighted
搜集汇总
数据集介绍

构建方式
该数据集名为Emotions,它是一个包含英文推文消息的数据集,旨在对六种基本情绪进行分类:愤怒、恐惧、喜悦、爱、悲伤和惊讶。数据集的构建基于Twitter平台上的文本信息,经过筛选和标注,将情绪状态与对应的文本内容相匹配,形成了具有标签的文本数据实例。在split配置中,数据被划分为训练集、验证集和测试集,共计20,000个示例;而在unsplit配置中,数据集包含416,809个示例,仅有一个训练集。
使用方法
使用该数据集时,用户可以根据具体的研究需求选择split或unsplit配置。对于split配置,可以直接利用训练集进行模型训练,验证集进行模型调整,测试集评估模型性能。而对于unsplit配置,用户需要自行划分数据集以适应不同的研究阶段。数据集的使用应遵守其授权协议,仅限于教育和研究目的,且在使用时需正确引用相关文献。
背景与挑战
背景概述
在自然语言处理领域,情绪识别是研究的热点之一,旨在理解和分类文本中表达的情绪状态。'jeffnyman/emotions' 数据集,创建于2018年,由Saravia等研究人员提出,并在论文《CARER: Contextualized Affect Representations for Emotion Recognition》中详细描述。该数据集包含英文Twitter消息,涵盖六种基本情绪:愤怒、恐惧、喜悦、爱、悲伤和惊讶。数据集的构建旨在为情绪识别任务提供基准数据,并推动该领域的研究进展。
当前挑战
该数据集在研究领域中面临的挑战主要表现在两个方面:一是情绪表达的复杂性和多样性使得分类任务具有挑战性;二是数据集构建过程中,如何有效处理和平衡不同情绪类别的样本分布,以及如何在保持数据真实性的同时确保隐私和版权问题得到妥善处理。
常用场景
经典使用场景
在自然语言处理领域,情感分析是理解人类情感表达的重要任务。jeffnyman/emotions数据集,作为英文推文情感分类的典范,广泛应用于机器学习模型的训练与评估。该数据集涵盖了六种基本情绪,为构建和优化情感分类模型提供了丰富的资源。
解决学术问题
该数据集解决了学术研究中对于情感分类标准不统一和标注数据不足的问题。通过提供大规模的标注数据,jeffnyman/emotions数据集使得研究者能够在统一的框架下评估不同模型的效果,推动了情感分析领域的发展。
实际应用
在实际应用中,jeffnyman/emotions数据集被用于社交媒体分析、用户情感趋势预测和情感驱动的交互式系统设计等场景。这些应用能够帮助企业更好地理解用户需求,提升用户体验。
数据集最近研究
最新研究方向
在情感识别领域,基于Twitter数据的情感分类研究正日益受到关注。'jeffnyman/emotions'数据集的近期研究方向主要聚焦于细粒度的情感识别,尤其是对六种基本情感:愤怒、恐惧、喜悦、爱、悲伤和惊讶的准确分类。研究者们正尝试通过深度学习模型,结合上下文信息,捕捉并建模语言中的细微差别和现象,以提高情感识别的准确度和鲁棒性。此数据集的运用,对于提升情感分析在实际应用中的性能,如情感驱动的推荐系统、智能客服等,具有重要的理论和实践意义。
以上内容由遇见数据集搜集并总结生成



