five

kellycyy/CulturalBench

收藏
Hugging Face2024-04-11 更新2024-04-19 收录
下载链接:
https://hf-mirror.com/datasets/kellycyy/CulturalBench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - question-answering - text-generation language: - en tags: - culture pretty_name: culturalbench size_categories: - n<1K --- # CulturalBench-v0.1: Evaluation data collected from CulturalTeaming -- AI-Assisted Interactive Red-Teaming for Challenging LLM on Multicultural Knowledge CulturalTeaming is an interactive red-teaming system that leverages the synergy of human-AI collaboration to collect a truly challenging dataset to assess LLMs’ multicultural knowledge. Through workshop sessions in our user studies, we gather users’ red-teaming attempts to form a compact yet high-quality evaluation dataset CULTURALBENCH-V0.1. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65fcaae6e5dc5b0ec1b726cf/DzFYxu13xtfqnb_c8matH.png) ## Quick Links: - [Platform (Click to play!)](https://cultural-norms-demo-b-team.apps.allenai.org/) - [HF Dataset](https://huggingface.co/datasets/kellycyy/CulturalBench) - [Paper](https://arxiv.org/abs/2404.06664) - **Language(s) (NLP):** English - **Point of Contact:** [Kelly Chiu](mailto:kellyc@allenai.org) ## Data Schema Description - `question_idx`: (int) Identifier for each entry. - `initial_question_template`: (str) Annotator’s initial draft of question (MCQ). - `final_question_template`: (str) Annotator’s final version of MCQ. Manually reviewed to ensure it follows the MCQ format and contains the culture to be asked. This question template is the one used for the evaluation of different models. - `correct_ans`: (str) Annotator's logged correct answer for their drafted MCQ. - `correct_ans_reason`: (str) Annotator's logged reason on the correct answer of their drafted MCQ. - `culture_represent`: (str) Annotator's logged the represented culture for the MCQ. - `culture_group_geographic`: (str) Geographic location grouping based on `culture_represent`. - `feedback_familiar_on_culture`: (int) An indicator of the annotator's familiarity with the represented culture on their drafted MCQ. The question is `How familiar are you with the represented culture? on a scale of 1 (unfamiliar) to 5 (familiar)` - `feedback_question_common`: (int) An indicator of the annotator's perception about the commonness of the situation embedded in their drafted MCQ. The question is `How common is the situation in the represented culture? on a scale of 1 (rare) to 5 (always)` - `feedback_question_difficult`: (int) An indicator of the annotator's perception about their drafted MCQ difficulty. The question is `How common is the situation in the represented culture? on a scale of 1 (rare) to 5 (always)` - `country_longest_living`: (str) Annotator's demographic information on their longest-growing-up area apart from the United States (US). `NA` for US. The question is `Apart from the US, which country/area did you live in the longest growing up?` - `year_for_country_longest_living`: (str) Annotator's demographic information on the number of living years on the `country_longest_living`. `NA` for US. The question is `How long have lived in the above country/area?` - `country_more_than_5_year`: (str) Annotator's demographic information on the country/area lived more than 5 years. The question is `In which country/area have you lived for more than five years?` - `country_more_than_1_year`: (str) Annotator's demographic information on the country/area lived more than 1 years. The question is `In which country/area have you lived for more than one year?`
提供机构:
kellycyy
原始信息汇总

CulturalBench-v0.1 数据集概述

基本信息

  • 许可证: Apache-2.0
  • 任务类别:
    • 问答
    • 文本生成
  • 语言: 英语
  • 标签: 文化
  • 数据集名称: culturalbench
  • 数据集大小: 小于1000条记录

数据集描述

CulturalBench-v0.1 是一个用于评估大型语言模型(LLMs)在多元文化知识方面的数据集。该数据集通过CulturalTeaming系统,利用人机协作的方式,在用户研究的工作坊中收集用户的红队尝试,形成了一个紧凑且高质量的评估数据集。

数据结构

数据集包含以下字段:

  • question_idx: 每个条目的标识符(整数)。
  • initial_question_template: 注释者的初始问题草案(多选题)。
  • final_question_template: 注释者的最终多选题版本,经过手动审查确保符合多选题格式并包含所询问的文化内容。
  • correct_ans: 注释者记录的正确答案。
  • correct_ans_reason: 注释者记录的正确答案的理由。
  • culture_represent: 注释者记录的所代表的文化。
  • culture_group_geographic: 基于culture_represent的地理位置分组。
  • feedback_familiar_on_culture: 注释者对所代表文化的熟悉程度(1-不熟悉至5-熟悉)。
  • feedback_question_common: 注释者对所嵌入情境的普遍性认知(1-罕见至5-总是)。
  • feedback_question_difficult: 注释者对问题难度的认知(1-罕见至5-总是)。
  • country_longest_living: 注释者除美国外最长居住的国家/地区。
  • year_for_country_longest_living: 在country_longest_living国家/地区的居住年数。
  • country_more_than_5_year: 注释者居住超过5年的国家/地区。
  • country_more_than_1_year: 注释者居住超过1年的国家/地区。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作