KoBBQ
收藏arXiv2024-01-25 更新2024-06-21 收录
下载链接:
https://jinjh0123.github.io/KoBBQ
下载链接
链接失效反馈官方服务:
资源简介:
KoBBQ是一个专为评估韩国文化背景下的语言模型社会偏见而设计的数据集。由韩国科学技术院(KAIST)创建,该数据集包含76,048个样本,覆盖12种社会偏见类别。KoBBQ通过大规模调查收集和验证反映韩国文化中刻板印象的社会偏见和目标群体。数据集的创建过程包括将原始BBQ数据集分为三类:可直接使用的简单转移样本、需要本地化的目标修改样本和不符合韩国文化的样本移除,并添加了四个反映韩国文化特定偏见的新类别。KoBBQ旨在通过精确的文化适应数据集,更准确和全面地测量语言模型的内在社会偏见,特别适用于评估和理解韩语环境中的偏见问题。
KoBBQ is a dataset specifically designed to assess social biases of language models in the context of Korean culture. Developed by the Korea Advanced Institute of Science and Technology (KAIST), this dataset comprises 76,048 samples covering 12 social bias categories. KoBBQ collects and verifies social biases and target groups that embody stereotypes prevalent in Korean culture through large-scale surveys. The construction process of KoBBQ includes dividing the original BBQ dataset into three groups: readily deployable directly usable samples, target-modified samples that necessitate cultural localization, and samples incompatible with Korean culture which are excluded. Furthermore, four new categories reflecting unique biases specific to Korean culture are added. KoBBQ aims to enable more accurate and comprehensive measurement of the intrinsic social biases of language models through a precisely culturally adapted dataset, and is particularly well-suited for evaluating and comprehending bias-related issues in the Korean linguistic context.
提供机构:
韩国科学技术院(KAIST)
创建时间:
2023-07-31



