five

BanglaEmotion (BanglaEmotion: A Benchmark Dataset for Bangla Textual Emotion Analysis)

收藏
OpenDataLab2026-05-31 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/BanglaEmotion
下载链接
链接失效反馈
官方服务:
资源简介:
BanglaEmotion 是一个手动注释的 Bangla Emotion 语料库,它结合了社交媒体文本中细粒度情感表达的多样性。更细粒度的情感标签被认为是悲伤、幸福、厌恶、惊奇、恐惧和愤怒——根据 Paul Ekman (1999),它们是六个基本情感类别。对于这项任务,从用户对两个不同 Facebook 群组(Ekattor TV 和 Airport Magistrates)的评论以及流行博主和活动家 Imran H Sarker 博士的公开帖子中收集了大量原始文本数据。这些评论主要是对当前社会政治问题和孟加拉国经济成功与失败的反应。从上述三个来源共抓取了 32923 条评论。其中,共有 6314 条评论被注释到六个类别中。标注语料的分布如下:sad = 1341 开心 = 1908 厌恶 = 703 惊讶 = 562 恐惧 = 384 生气 = 1416 还从上述数据中提供了一个平衡集,并将数据集分成等比的训练集和测试集. 5:1 的比例用于培训和评估目的。有关数据集及其实验的更多信息可以在我们的论文中找到(相关链接如下)。

BanglaEmotion is a manually annotated Bangla Emotion corpus that encapsulates the diversity of fine-grained emotional expressions in social media texts. The more fine-grained emotion labels include sadness, happiness, disgust, surprise, fear, and anger, which are six basic emotion categories proposed by Paul Ekman (1999). For this dataset, a large amount of raw text data was collected from user comments on two distinct Facebook groups (Ekattor TV and Airport Magistrates) as well as public posts by the renowned blogger and activist Dr. Imran H Sarker. These comments mainly reflect reactions to current socio-political issues as well as the successes and failures of Bangladesh’s economy. A total of 32,923 comments were crawled from the three aforementioned sources, among which 6,314 comments were annotated into the six emotion categories. The distribution of the annotated corpus is as follows: sadness = 1,341; happiness = 1,908; disgust = 703; surprise = 562; fear = 384; anger = 1,416. A balanced subset is also provided from the aforementioned data, and the dataset is split into training and test sets at a 5:1 ratio for training and evaluation purposes. More information about the dataset and its related experiments can be found in our paper (the relevant link is provided below).
提供机构:
OpenDataLab
创建时间:
2022-05-23
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
BanglaEmotion 是一个用于孟加拉语文本情感分析的基准数据集,包含从社交媒体收集并手动注释的6314条评论,涵盖六种基本情感类别。该数据集提供了平衡的训练和测试集分割,适用于情感分类任务。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作