five

A Multilingual Dataset for Religious Hate Speech Detection in Bangla, Banglish, and English.

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/bcybgsc4fy
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset addresses the challenge of detecting religious hate speech in social media comments, a task that remains underexplored in low-resource and code-mixed language settings. Besides, there is currently a scarcity of datasets focusing solely on banglish, which is often found on social media. Hence, this dataset was created to address the lack of publicly available resources that jointly cover Bangla, Banglish (Romanized Bangla), and English, particularly in the context of religious hate speech.The dataset consists of 5,749 Bangla comments, 3,783 Banglish comments, and 1,386 English comments. The data were collected from social media platforms, primarily YouTube and Facebook, using a combination of official APIs,scrappers, and manual collection. Both automated scripts and manual inspection were used for data cleaning, followed by a careful annotation process conducted with the assistance of domain experts. This dataset can be used for various downstream tasks, including religious hate speech detection, toxicity analysis, and code-mixed language research, and is particularly valuable for studying online conversations within Bangla-speaking communities.
创建时间:
2026-03-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作