five

HOCON34k: A Corpus of Hate speech in Online Comments from German Newspapers

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12665947
下载链接
链接失效反馈
官方服务:
资源简介:
We have compiled a dataset containing 34,223 comments in German, authored by users from online-platforms associated with public discourse in German newspapers. Each comment was annotated for hate speech and the adequacy of contextual information by a group of 29 volunteers, using a binary annotation approach. The inter-rater reliability for hate speech is 0.4428 across all annotators and increases to 0.6078 when considering an optimized subset of 12 annotators, as measured by Fleiss’ Kappa. Additionally, we present a baseline text classification using BERT, achieving an MCC-score up to 0.32 and an F2-score up to 0.64 in our initial experiment on this new corpus. The data set, named HOCON34k, comprising German hate speech comments from newspapers, is publicly available for research purposes.
创建时间:
2024-12-02
二维码
社区交流群
二维码
科研交流群
商业服务