ToxLex_bn: A Curated Dataset of Bangla Toxic Language Derived from Facebook Comment

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/9pz8ssmc49

下载链接

链接失效反馈

官方服务：

资源简介：

ToxLex or Lexicon of toxic language is a dataset having the aggressive and abusive bad words used in social media, Specifically, this dataset contains utterances from the user-generated comments of Facebook. The texts cover the demographic and thematic distribution of Bangla's toxic language on social media. The data have been extracted from 8 publicly open Facebook pages. This dataset is a curated, de-duplicated, anonymized dataset that is derived from raw comments. The dataset contains 1959 rows with 08 columns and each row represents a toxic bigram with its corresponding features such as transcriptions, translation, spelling standards, and degree of toxicity. This dataset is single human-annotated and curated to define classifiers for toxic language detection systems. Apart from this, it is considered a wordlist having Bangla cyberbullying, hate speech, and slang. Warning: this dataset contains text content that may be distressing or upsetting.

创建时间：

2022-04-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集