Amharic Hate Speech Detection Dataset
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5036436
下载链接
链接失效反馈官方服务:
资源简介:
Amharic Hate Speech Detection Dataset V1
To contribute for the research and development of hate speech detection in Amharic language from social media, we are glad to release our hate speech dataset we prepared from the Ethiopian Broadcasting Corporation (EBC) Facebook page (https://www.facebook.com/EBCzena), and some chosen Facebook page (https://www.facebook.com/604407519910492) that we found potential hateful comments.
We extracted comments/posts pertaining to race, religion, and ethnicity using the Facepager API, resulting in a set of 30,000 comments between April 15, 2019 and December 15, 2019. A total of 5,000 comments/posts were chosen at random for annotation. Three annotators (two candidate PhD. in Linguistics and one MSc. in Law) manually annotated the selected samples as “Hate” or “not-Hate” resulting 2,000 (1000 hate and 1000 non-hate) labeled comments because of majority vote among the annotators.
For the labeling procedure, the annotators used Ethiopian government’s hate speech and misinformation prevention and suppression proclamation https://www.accessnow.org/cms/assets/uploads/2020/05/Hate-Speech-and-Disinformation-Prevention-and-Suppression-Proclamation.pdf, as well as our definition of hate speech and the hate speech characterization lists proposed in Fino (2020) https://doi.org/10.1093/jicj/mqaa023) were provided to the annotators.
Accordingly, a speech is labeled as “Hate” when:
“the speech targets a group or individual as a member of a group (ethnicity, race, religion)”
“the speech content in the message expresses hatred”
“the speech causes a harm”
“the speaker intends harm or bad activity”
“the speech incites bad actions”
“the speech is either public and directed at a member of the group”
“the context makes violent response possible”
创建时间:
2021-06-28



