five

BADD: A Large-Scale Dataset for Arrogance Detection in the Bengali Language

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/fyzy2z8nzx
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains 46,128 labeled Bengali comments curated for the task of arrogance detection. While existing datasets focus heavily on hate speech or cyberbullying, this dataset addresses the subtle linguistic nuances of "arrogance", characterized by overbearing pride, lack of empathy, and social superiority, which is often expressed without overt toxicity. The data was compiled to support research in Bengali NLP. It serves as the primary resource for training the high-performing BanglaBERT model (96% accuracy) described in the accompanying research paper. Dataset Structure The dataset is provided in a single .csv file with the following columns: comment: The raw Bengali text. source: The origin of the comment (online or AI). weak_label: Initial label assigned by heuristic functions. snorkel_label: Refined label produced by the Snorkel framework. final_label: The target label for classification. 1: Arrogant 0:Non-arrogant **Further an automaited English translated dataset is attached as test_translated_data.csv
创建时间:
2026-03-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作