KOSBI

Name: KOSBI
Creator: NAVER AI Lab
Published: 2023-05-30 09:42:07
License: 暂无描述

arXiv2023-05-30 更新2024-06-21 收录

下载链接：

https://github.com/naver-ai/korean-safety-benchmarks

下载链接

链接失效反馈

官方服务：

资源简介：

KOSBI是一个针对韩国语言和文化定制的大型社会偏见数据集，包含34,214对情境和句子，涵盖了15个类别的72个不同人口群体。该数据集通过过滤方法减少生成内容中的社会偏见，平均减少了16.47%。数据集的创建过程涉及使用HyperCLOVA生成数据，并通过众包工作者进行标注，以确保安全性和有效性。KOSBI的应用领域主要集中在提高大型语言模型在处理韩国特定社会和文化背景下的安全性和公平性。

KOSBI is a large-scale social bias dataset tailored for the Korean language and culture. It contains 34,214 pairs of contexts and sentences, covering 72 distinct demographic groups across 15 categories. This dataset employs filtering methods to mitigate social bias in generated content, achieving an average reduction of 16.47%. The construction of KOSBI involved generating data samples using HyperCLOVA and annotating the dataset via crowdworkers to ensure its safety and validity. Its primary application domains focus on enhancing the safety and fairness of large language models (LLMs) when processing Korean-specific social and cultural contexts.

提供机构：

NAVER AI Lab

创建时间：

2023-05-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集