Toxicity Detection Dataset in Twi Language
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/pvrdx7hwhz
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 2,001 text entries labeled for toxicity classification. Each entry represents a user-generated comment along with an assigned toxicity label. The dataset is structured into two columns:
COMMENT– A text field containing comments written primarily in Akan (Twi). These comments include expressions of gratitude, feedback, conversational messages, and general communication typical of social or online interactions.
LABEL– A categorical variable indicating whether the comment is 'toxic' or 'non-toxic'.
Current labels present in the dataset: 'non-toxic' (and any others present in the full file, if applicable).
Key Features:
• Total records: 2,001
• Language: Primarily Akan (Twi)
• Classification type: Binary toxicity classification
There are no missing values (both columns have 2,001 non-null entries)
Data types: ‘COMMENT’: string and ‘LABEL’`: string
This dataset can support research in:
• Toxic language detection in low-resource languages
• Natural Language Processing (NLP) for African languages
• Machine learning model training for text classification
• Sociolinguistic analysis of online conversational content
The File Format is CSV file: Toxicity_dataset.csv
It contains two columns: 'COMMENT' and ‘LABEL'
创建时间:
2026-01-09



