five

SentiGOLD

收藏
arXiv2023-06-09 更新2024-06-21 收录
下载链接:
https://sentiment.bangla.gov.bd
下载链接
链接失效反馈
官方服务:
资源简介:
SentiGOLD是一个大规模的孟加拉语多领域情感分析数据集,由孟加拉国计算机委员会创建,包含70,000个样本。该数据集从在线视频评论、社交媒体帖子、博客文章、新闻等多种来源收集数据,并由性别平衡的语言学家团队进行标注。SentiGOLD遵循一套标准的语言学约定,涵盖30个领域,如政治、娱乐、体育等,并使用5个类别(强烈负面、轻微负面、中性、轻微正面、强烈正面)进行标注。数据集的创建过程严格,确保了高质量的标注和广泛的应用领域,旨在解决孟加拉语情感分析的挑战。

SentiGOLD is a large-scale Bengali multi-domain sentiment analysis dataset created by the Bangladesh Computer Council, which contains 70,000 samples. Data was collected from diverse sources including online video comments, social media posts, blog articles, and news outlets, and annotated by a gender-balanced team of linguists. SentiGOLD adheres to a set of standard linguistic conventions, covers 30 domains such as politics, entertainment, sports, and others, and uses five annotation categories: Strongly Negative, Slightly Negative, Neutral, Slightly Positive, and Strongly Positive. The dataset is developed through a rigorous process to ensure high-quality annotations and broad applicability, aiming to address the challenges in Bengali sentiment analysis.
提供机构:
孟加拉国计算机委员会
创建时间:
2023-06-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作