Anubhuti

arXiv2020-10-07 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2010.03065v1

下载链接

链接失效反馈

官方服务：

资源简介：

Anubhuti是首个针对孟加拉语短篇小说情感分析的大型文本语料库，由计算机科学与工程比拉技术学院创建。该数据集包含159篇来自不同作者的短篇小说，涵盖10种不同类型，总计超过30000个句子。创建过程中，数据集通过自动化脚本进行句子分割，并由三位具有深厚孟加拉语背景的标注者进行手动标注，确保了高度的标注一致性。Anubhuti数据集不仅适用于机器学习和深度学习模型的情感分类任务，还为语言学家和数据分析师提供了研究孟加拉文学情感表达的宝贵资源，旨在解决孟加拉语情感分析资源稀缺的问题。

Anubhuti is the first large-scale text corpus for sentiment analysis of Bengali short stories, developed by the Department of Computer Science and Engineering, Birla Institute of Technology. This dataset contains 159 short stories from diverse authors, spanning 10 distinct genres, with a total of over 30,000 sentences. During its construction, the dataset underwent sentence segmentation via automated scripts, followed by manual annotation by three annotators with extensive Bengali language proficiency, ensuring high inter-annotator agreement. The Anubhuti dataset not only supports sentiment classification tasks for machine learning and deep learning models, but also serves as a valuable resource for linguists and data analysts to study emotional expression in Bengali literature, aiming to address the scarcity of resources for Bengali sentiment analysis.

提供机构：

计算机科学与工程比拉技术学院, 梅斯拉

创建时间：

2020-10-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集