BABSA: A Large Scale Bangla Aspect Based Sentiment Analysis Dataset
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/j7yb2sv263
下载链接
链接失效反馈官方服务:
资源简介:
BABSA (Bangla Aspect-Based Sentiment Analysis) is a large-scale, manually annotated dataset for fine-grained aspect-level sentiment analysis in Bangla. The dataset contains 15,860 quality-controlled instances spanning 21 domains, including book reviews, product reviews, news, politics, and social commentary.
Dataset Files:
final_set.csv – Primary release file containing 15,860 manually annotated instances after quality control and filtering.
total.csv – Complete collection of 24,653 instances prior to filtering, provided for transparency and reproducibility.
Schema of final_set.csv:
(Column, Description) : [ (text_content, Full Bangla text (review, comment, or news excerpt) ,
(AnnotatedAspect : Comma separated list of Aspect term or phrase extracted from the text ),
(AnnotatedSentiment: Comma separated Sentiment polarity (positive, neutral, or negative)) ,
(MacroCategory : Domain/topic category (one of 21 predefined categories) )
]
Schema of total.csv:
(Column, Description) : [ (text_content, Full Bangla text (review, comment, or news excerpt) ,
(AnnotatedAspect : Comma separated list of Aspect term or phrase extracted from the text ),
(AnnotatedSentiment: Comma separated list of Sentiment polarity (positive, neutral, or negative))
]
Text content was aggregated from four publicly available Bangla NLP corpora (BanglaBook, SentNoB, EmoNoBa, Sazzed) and a web-scraped Bangla news corpus (January–June 2025). All aspect-level annotations (aspect terms, boundaries, and sentiment labels) are original contributions created through a three-pass manual annotation protocol, achieving inter-annotator agreement of Cohen's κ ≥ 0.84.
创建时间:
2025-12-08



