BLEND
收藏arXiv2024-06-14 更新2024-06-18 收录
下载链接:
https://github.com/nlee0212/BLEnD
下载链接
链接失效反馈官方服务:
资源简介:
BLEND是由韩国科学技术院等机构合作创建的多文化多语言数据集,包含52.6k个问题-答案对,覆盖16个国家和地区,涉及13种语言。数据集通过精心设计的问题模板和本地化翻译,确保了文化相关性和多样性。创建过程中,通过招募本地注释者进行问题收集、过滤、翻译和答案标注,确保了数据的真实性和准确性。BLEND主要用于评估大型语言模型在日常文化知识方面的表现,特别是对于非英语和资源较少语言的文化敏感性,旨在解决现有模型在跨文化和跨语言环境下的知识偏差问题。
BLEND is a multicultural and multilingual dataset co-created by institutions including the Korea Advanced Institute of Science and Technology (KAIST). It comprises 52.6k question-answer pairs, covering 16 countries and regions and involving 13 languages. The dataset ensures cultural relevance and diversity through meticulously designed question templates and localized translations. During its development, local annotators were recruited for question collection, filtering, translation and answer annotation, which guarantees the authenticity and accuracy of the data. BLEND is primarily utilized to evaluate the performance of large language models (LLMs) on everyday cultural knowledge, particularly with regard to cultural sensitivity for non-English and low-resource languages, and aims to address the issue of knowledge bias present in existing models across cross-cultural and cross-lingual scenarios.
提供机构:
韩国科学技术院
创建时间:
2024-06-14



