five

Camer-Hate-FR: An Annotated Dataset for Hate Speech Detection in Cameroonian French.

收藏
DataCite Commons2026-04-23 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/rjwttgp23m
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset, titled Camer-Hate-FR, provides a valuable resource for detecting hate speech within the unique linguistic context of Cameroonian French. The data consists of 46,825 messages collected between January and June 2025 from public Cameroonian social media sources, including Facebook pages, YouTube channels, and WhatsApp groups. Existing hate speech detection models, primarily trained on standard European French, perform poorly on Cameroonian data due to the prevalent use of local slang, code-switching with English and indigenous languages (Camfranglais), and nuanced cultural contexts. This dataset was created to address this gap. Each message has been manually annotated by three native speakers as either 'hateful' or 'non-hateful', with the final label determined by a majority vote. Each entry includes the original text, annotation counts, the final vote, and the justifications provided by annotators. All data has been fully anonymized to protect user privacy. The dataset is provided in three versions: camer_hate_fr_dataset.csv — original Cameroonian French version, with labels haineux / non_haineux cameroon_hate_speech_UK_English.csv — full translation in British English (spelling: recognise, offence, cancelled), with labels hateful / non_hateful cameroon_hate_speech_US_English.csv — full translation in American English (spelling: recognize, offense, canceled), with labels hateful / non_hateful This resource is designed to train, validate, and benchmark machine learning models for content moderation, facilitate sociolinguistic analysis, and spur the development of more inclusive and effective NLP technologies for Francophone Africa.
提供机构:
Mendeley Data
创建时间:
2025-12-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作