EtiCor
收藏arXiv2023-10-29 更新2024-06-21 收录
下载链接:
https://github.com/Exploration-Lab/EtiCor
下载链接
链接失效反馈官方服务:
资源简介:
EtiCor是一个专注于全球五大地区(东亚、印度、中东与非洲、北美与欧洲、拉丁美洲)社会规范的数据集,包含36,347条英文标注句子。该数据集旨在评估大型语言模型对地区特定礼仪的理解能力。EtiCor的创建涉及从政府网站等公共来源收集文本,并进行预处理和手动标注。数据集的应用领域包括提升AI系统对文化差异的敏感性和包容性,以及评估模型在特定文化背景下的表现。
EtiCor is a dataset dedicated to studying social norms across five global regions: East Asia, India, the Middle East and Africa, North America and Europe, and Latin America. It comprises 36,347 English annotated sentences. The core objective of EtiCor is to evaluate the capacity of large language models (LLMs) to comprehend region-specific etiquette. The development of EtiCor entailed collecting textual data from public sources including government websites, followed by preprocessing and manual annotation. Its application areas include enhancing the cultural sensitivity and inclusiveness of AI systems, as well as benchmarking model performance in specific cultural contexts.
提供机构:
印度理工学院坎普尔分校
创建时间:
2023-10-29
搜集汇总
数据集介绍

背景与挑战
背景概述
EtiCor是一个用于分析大型语言模型礼仪理解能力的语料库,包含来自全球五个主要区域的36,347个社会规范和礼仪文本示例,每个示例标注了在区域背景下是否适当。该数据集定义了'礼仪敏感性'任务,用于评估模型对区域特定礼仪的认知,并已测试了多种主流语言模型。
以上内容由遇见数据集搜集并总结生成



