K-MHaS
收藏arXiv2022-09-30 更新2024-06-21 收录
下载链接:
https://github.com/adlnlp/KMHaS
下载链接
链接失效反馈官方服务:
资源简介:
K-MHaS是由悉尼大学等机构创建的一个大型韩语多标签仇恨言论检测数据集,包含109,692条来自韩国在线新闻评论的数据。该数据集通过多标签分类处理韩语语言模式,允许使用1至4个标签进行分类,并考虑了主观性和交叉性。数据集的创建过程涉及随机选择评论、预处理和多标签注释,旨在通过精细的分类反映社会和历史背景,特别是在政治、性别等领域。K-MHaS的应用领域包括自动仇恨言论检测和内容审核,旨在解决非英语语言环境中仇恨言论检测资源有限的问题。
K-MHaS is a large-scale Korean multi-label hate speech detection dataset developed by the University of Sydney and other institutions. It contains 109,692 comments sourced from South Korean online news platforms. This dataset adopts a multi-label classification framework for Korean linguistic patterns, supporting classification with 1 to 4 labels while accounting for subjectivity and intersectionality. The dataset's development workflow encompasses random comment selection, preprocessing, and multi-label annotation, with the goal of reflecting social and historical contexts—particularly in domains such as politics and gender—through fine-grained classification. Application scenarios of K-MHaS cover automated hate speech detection and content moderation, and it is designed to alleviate the shortage of hate speech detection resources in non-English language contexts.
提供机构:
悉尼大学
创建时间:
2022-08-23



