kellycyy/CulturalBench

Name: kellycyy/CulturalBench
Creator: kellycyy
Published: 2024-04-11 00:54:23
License: 暂无描述

Hugging Face2024-04-11 更新2024-04-19 收录

下载链接：

https://hf-mirror.com/datasets/kellycyy/CulturalBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - question-answering - text-generation language: - en tags: - culture pretty_name: culturalbench size_categories: - n<1K --- # CulturalBench-v0.1: Evaluation data collected from CulturalTeaming -- AI-Assisted Interactive Red-Teaming for Challenging LLM on Multicultural Knowledge CulturalTeaming is an interactive red-teaming system that leverages the synergy of human-AI collaboration to collect a truly challenging dataset to assess LLMs’ multicultural knowledge. Through workshop sessions in our user studies, we gather users’ red-teaming attempts to form a compact yet high-quality evaluation dataset CULTURALBENCH-V0.1. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65fcaae6e5dc5b0ec1b726cf/DzFYxu13xtfqnb_c8matH.png) ## Quick Links: - [Platform (Click to play!)](https://cultural-norms-demo-b-team.apps.allenai.org/) - [HF Dataset](https://huggingface.co/datasets/kellycyy/CulturalBench) - [Paper](https://arxiv.org/abs/2404.06664) - **Language(s) (NLP):** English - **Point of Contact:** [Kelly Chiu](mailto:kellyc@allenai.org) ## Data Schema Description - `question_idx`: (int) Identifier for each entry. - `initial_question_template`: (str) Annotator’s initial draft of question (MCQ). - `final_question_template`: (str) Annotator’s final version of MCQ. Manually reviewed to ensure it follows the MCQ format and contains the culture to be asked. This question template is the one used for the evaluation of different models. - `correct_ans`: (str) Annotator's logged correct answer for their drafted MCQ. - `correct_ans_reason`: (str) Annotator's logged reason on the correct answer of their drafted MCQ. - `culture_represent`: (str) Annotator's logged the represented culture for the MCQ. - `culture_group_geographic`: (str) Geographic location grouping based on `culture_represent`. - `feedback_familiar_on_culture`: (int) An indicator of the annotator's familiarity with the represented culture on their drafted MCQ. The question is `How familiar are you with the represented culture? on a scale of 1 (unfamiliar) to 5 (familiar)` - `feedback_question_common`: (int) An indicator of the annotator's perception about the commonness of the situation embedded in their drafted MCQ. The question is `How common is the situation in the represented culture? on a scale of 1 (rare) to 5 (always)` - `feedback_question_difficult`: (int) An indicator of the annotator's perception about their drafted MCQ difficulty. The question is `How common is the situation in the represented culture? on a scale of 1 (rare) to 5 (always)` - `country_longest_living`: (str) Annotator's demographic information on their longest-growing-up area apart from the United States (US). `NA` for US. The question is `Apart from the US, which country/area did you live in the longest growing up?` - `year_for_country_longest_living`: (str) Annotator's demographic information on the number of living years on the `country_longest_living`. `NA` for US. The question is `How long have lived in the above country/area?` - `country_more_than_5_year`: (str) Annotator's demographic information on the country/area lived more than 5 years. The question is `In which country/area have you lived for more than five years?` - `country_more_than_1_year`: (str) Annotator's demographic information on the country/area lived more than 1 years. The question is `In which country/area have you lived for more than one year?`

提供机构：

kellycyy

原始信息汇总