KOLD: Korean Offensive Language Dataset
收藏arXiv2022-11-05 更新2024-06-21 收录
下载链接:
http://github.com/boychaboy/KOLD
下载链接
链接失效反馈官方服务:
资源简介:
KOLD是首个针对韩语的攻击性语言数据集,由韩国科学技术院计算学院创建,包含40,429条来自NAVER新闻和YouTube平台的评论。该数据集采用层次化标注方法,不仅标注了攻击性语言的类型和目标,还提供了相应的文本范围标注。数据集的创建旨在解决跨文化语言差异问题,特别是在攻击性语言检测领域。通过提供文章标题和视频标题作为上下文信息,KOLD数据集能够帮助模型更准确地识别攻击性内容,并应用于改善韩语环境下的攻击性语言检测技术。
KOLD is the first offensive language dataset tailored for the Korean language, developed by the School of Computing at the Korea Advanced Institute of Science and Technology. It contains 40,429 comments sourced from NAVER News and YouTube platforms. Adopting a hierarchical annotation methodology, this dataset not only annotates the type and target of offensive language but also provides corresponding text span annotations. The dataset was created to address cross-cultural linguistic discrepancies, especially within the domain of offensive language detection. By supplying article titles and video titles as contextual information, the KOLD dataset enables models to more accurately recognize offensive content, and can be applied to advance offensive language detection technologies in Korean-language environments.
提供机构:
韩国科学技术院(KAIST)计算学院
创建时间:
2022-05-23



