Hate speech Dataset (Somali Language) - Sheet1.csv
收藏DataCite Commons2025-08-15 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/Hate_speech_Dataset_Somali_Language_-_Sheet1_csv/29920445
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains <b>8,049 text samples</b> in the Somali language, labeled for hate speech detection. Each entry includes an ID, the original text, and a binary label indicating whether the text contains hate speech (<code>1</code>) or not (<code>0</code>). The dataset is the first publicly available resource of its kind for Somali, a low-resource language in NLP, and is intended to support research in <b>natural language processing, computational linguistics, and machine learning</b> for hate speech detection and content moderation.<b>Structure:</b><code>ID</code>: Unique identifier for each text sample.<code>Text</code>: The Somali-language text content.<code>Label</code>: Binary classification label — <code>1</code> for hate speech, <code>0</code> for non-hate speech.<code>Modification</code> and <code>Unnamed</code>: Auxiliary columns with partial or empty entries (may be ignored for modeling).<b>Potential Uses:</b>Training and evaluating hate speech detection models.Research on low-resource language processing.Sociolinguistic analysis of harmful language in Somali.<b>Citation:</b> If you use this dataset, please cite it using the DOI provided on this Figshare record.<b>License:</b> <i>(CC BY 4.0 for open use with attribution)</i><b>Acknowledgment:</b> This dataset was created to advance open research in Somali language technologies and to address gaps in low-resource NLP.
提供机构:
figshare
创建时间:
2025-08-15



