Korean Online Hate Speech Dataset for Multilabel Classification
收藏arXiv2022-04-08 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2204.03262v2
下载链接
链接失效反馈官方服务:
资源简介:
该数据集涵盖了七种类型的仇恨言论:(1)种族和国籍,(2)宗教,(3)地域主义,(4)年龄歧视,(5)厌女症,(6)性少数群体,(7)男性。数据集包含35K条数据,其中24K条在线评论,Krippendorff的Alpha标签一致性为.713,2.2K条来自维基百科的中性句子,1.7K条通过人机交互过程额外标注的句子,以及7.1K条规则生成的中性句子。
This dataset covers seven categories of hate speech: (1) race and nationality, (2) religion, (3) regionalism, (4) ageism, (5) misogyny, (6) sexual minorities, and (7) misandry. The dataset contains a total of 35K entries, including 24K online comments with a Krippendorff's Alpha inter-annotator agreement score of 0.713, 2.2K neutral sentences sourced from Wikipedia, 1.7K additionally annotated sentences via human-machine interaction processes, and 7.1K rule-generated neutral sentences.
创建时间:
2022-04-07



