Hate speech Dataset (Somali Language) - Sheet1.csv
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Hate_speech_Dataset_Somali_Language_-_Sheet1_csv/29920445
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 8,049 text samples in the Somali language, labeled for hate speech detection. Each entry includes an ID, the original text, and a binary label indicating whether the text contains hate speech (1) or not (0). The dataset is the first publicly available resource of its kind for Somali, a low-resource language in NLP, and is intended to support research in natural language processing, computational linguistics, and machine learning for hate speech detection and content moderation.
Structure:
ID: Unique identifier for each text sample.Text: The Somali-language text content.Label: Binary classification label — 1 for hate speech, 0 for non-hate speech.Modification and Unnamed: Auxiliary columns with partial or empty entries (may be ignored for modeling).Potential Uses:
Training and evaluating hate speech detection models.Research on low-resource language processing.Sociolinguistic analysis of harmful language in Somali.Citation: If you use this dataset, please cite it using the DOI provided on this Figshare record.
License: (CC BY 4.0 for open use with attribution)
Acknowledgment: This dataset was created to advance open research in Somali language technologies and to address gaps in low-resource NLP.
创建时间:
2025-08-15



