A Multilingual Dataset for Religious Hate Speech Detection in Bangla, Banglish, and English.
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/bcybgsc4fy
下载链接
链接失效反馈官方服务:
资源简介:
This dataset addresses the challenge of detecting religious hate speech in social media comments, a
task that remains underexplored in low-resource and code-mixed language settings. Besides, there is
currently a scarcity of datasets focusing solely on banglish, which is often found on social media.
Hence, this dataset was created to address the lack of publicly available resources that jointly cover
Bangla, Banglish (Romanized Bangla), and English, particularly in the context of religious hate
speech.The dataset consists of 5,749 Bangla comments, 3,783 Banglish comments, and 1,386 English
comments. The data were collected from social media platforms, primarily YouTube and Facebook,
using a combination of official APIs,scrappers, and manual collection. Both automated scripts and
manual inspection were used for data cleaning, followed by a careful annotation process conducted
with the assistance of domain experts. This dataset can be used for various downstream tasks,
including religious hate speech detection, toxicity analysis, and code-mixed language research, and is
particularly valuable for studying online conversations within Bangla-speaking communities.
创建时间:
2026-03-02



