Toxicity Detection Dataset in Twi Language

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/pvrdx7hwhz

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains 2,001 text entries labeled for toxicity classification. Each entry represents a user-generated comment along with an assigned toxicity label. The dataset is structured into two columns: COMMENT– A text field containing comments written primarily in Akan (Twi). These comments include expressions of gratitude, feedback, conversational messages, and general communication typical of social or online interactions. LABEL– A categorical variable indicating whether the comment is 'toxic' or 'non-toxic'. Current labels present in the dataset: 'non-toxic' (and any others present in the full file, if applicable). Key Features: • Total records: 2,001 • Language: Primarily Akan (Twi) • Classification type: Binary toxicity classification There are no missing values (both columns have 2,001 non-null entries) Data types: ‘COMMENT’: string and ‘LABEL’`: string This dataset can support research in: • Toxic language detection in low-resource languages • Natural Language Processing (NLP) for African languages • Machine learning model training for text classification • Sociolinguistic analysis of online conversational content The File Format is CSV file: Toxicity_dataset.csv It contains two columns: 'COMMENT' and ‘LABEL'

创建时间：

2026-01-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集