Bengali Political Sentiment Analysis Dataset

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://data.mendeley.com/datasets/x5yc4m5yg2

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset comprises 3,290 Bengali political comments sourced from social media platforms, news comment sections, and online political discussions, specifically curated for sentiment analysis research in Bengali NLP. The corpus provides a comprehensive resource for training and evaluating sentiment classification models within the political domain. The dataset features 3,290 instances distributed across five sentiment classes with excellent balance (variance <8%): Very Negative (675, 20.5%), Negative (663, 20.2%), Neutral (626, 19.0%), Very Positive (664, 20.2%), and Positive (662, 20.1%). Stored in Excel format with two columns containing Bengali political comments (Unicode text) and corresponding sentiment labels, the dataset maintains high quality with no missing values and verified annotations. Comment lengths average 83 characters, ranging from 11 to 398 characters. The collection encompasses diverse political discourse including government policies and governance, electoral processes and democracy, political parties and leadership dynamics, social and economic issues, current affairs and political events, along with public opinion and citizen responses to political developments. This dataset serves multiple research purposes, including Bengali sentiment analysis model development and benchmarking, political discourse analysis and opinion mining, natural language processing research for low-resource languages, cross-lingual sentiment analysis studies, social media analytics for Bengali content, multi-class text classification research, and comparative political sentiment studies across different linguistic and cultural contexts.

本数据集共收录3290条孟加拉语政治评论，数据源自社交媒体平台、新闻评论区及在线政治讨论场景，专为孟加拉语自然语言处理（Natural Language Processing）领域的情感分析研究精心甄选构建。该语料库可为政治领域情感分类模型的训练与评估提供全面的资源支撑。本数据集包含3290条样本，均匀分布于5个情感类别中，类别间方差小于8%，分布平衡性极佳：极强负面（675条，占比20.5%）、负面（663条，占比20.2%）、中性（626条，占比19.0%）、极强正面（664条，占比20.2%）及正面（662条，占比20.1%）。数据集以Excel格式存储，包含两列数据：分别为孟加拉语政治评论（Unicode文本）及对应的情感标签，整体质量优异，无缺失值且标注均经过验证。评论文本长度均值为83个字符，长度区间为11至398个字符。本次收录的评论涵盖多元政治话语范畴，包括政府政策与治理、选举进程与民主、政党与领导动态、社会与经济议题、时事与政治事件，以及公众对政治发展的舆论与民众反馈。本数据集可服务于多项研究场景，包括孟加拉语情感分析模型的开发与基准测试、政治话语分析与观点挖掘、低资源语言自然语言处理研究、跨语言情感分析研究、孟加拉语内容社交媒体分析、多分类文本分类研究，以及不同语言与文化语境下的政治情感对比研究。

创建时间：

2025-10-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集