five

GoEmotions

收藏
www.kaggle.com2021-02-18 更新2025-03-23 收录
下载链接:
https://www.kaggle.com/debarshichanda/goemotions
下载链接
链接失效反馈
官方服务:
资源简介:
### Context GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral. - Number of examples: 58,009. - Number of labels: 27 + Neutral. - Maximum sequence length in training and evaluation datasets: 30. ### Content On top of the raw data, we also include a version filtered based on reter-agreement, which contains a train/test/validation split: - Size of training dataset: 43,410. - Size of test dataset: 5,427. - Size of validation dataset: 5,426. The emotion categories are: admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise. `analyze_data.py` and `extract_words.py` have already been run and the output files are stored in `plots` and `tables` respectively. What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. Original repository from where the data is taken: [https://github.com/google-research/google-research/tree/master/goemotions](https://github.com/google-research/google-research/tree/master/goemotions) However, some of the scripts were giving errors which have been rectified and updated in this repository: [https://github.com/DebarshiChanda/google-research/tree/master/goemotions](https://github.com/DebarshiChanda/google-research/tree/master/goemotions) ### Acknowledgements The entire Credit for creating this dataset goes to Google Research Team. ### Inspiration Detect emotion from the text using Multi-label Classification

### 背景介绍 GoEmotions 是一个包含 58,000 条精心挑选的评论语料库,这些评论源自 Reddit,并经过人工标注,分为 27 个情感类别或中性类别。 - 示例数量:58,009。 - 标签数量:27 个类别 + 中性类别。 - 训练集和评估集的最大序列长度:30。 ### 内容 在原始数据的基础上,我们还提供了一种基于重合度筛选的版本,其中包含训练集、测试集和验证集的划分: - 训练集大小:43,410。 - 测试集大小:5,427。 - 验证集大小:5,426。 情感类别包括:钦佩、娱乐、愤怒、烦恼、赞同、关心、困惑、好奇、欲望、失望、不赞同、厌恶、尴尬、兴奋、恐惧、感激、悲伤、喜悦、爱情、紧张、乐观、自豪、领悟、宽慰、懊悔、悲伤、惊讶。 `analyze_data.py` 和 `extract_words.py` 文件已运行,输出文件分别存储在 `plots` 和 `tables` 目录中。 其中所包含的不仅是行与列,还应当通过描述数据的获取方式及其所代表的时间段,简化其他用户的入门过程。 数据来源的原始仓库:[https://github.com/google-research/google-research/tree/master/goemotions](https://github.com/google-research/google-research/tree/master/goemotions) 然而,某些脚本在运行时出现了错误,这些问题已在当前仓库中得到了修正和更新:[https://github.com/DebarshiChanda/google-research/tree/master/goemotions](https://github.com/DebarshiChanda/google-research/tree/master/goemotions) ### 致谢 创建此数据集的全部荣誉归于 Google Research 团队。 ### 启示 利用多标签分类从文本中检测情感。
提供机构:
www.kaggle.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作