notrichardren/easy_qa

Name: notrichardren/easy_qa
Creator: notrichardren
Published: 2023-06-26 12:33:45
License: 暂无描述

Hugging Face2023-06-26 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/notrichardren/easy_qa

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - question-answering language: - en pretty_name: Easy Question Answer --- # EasyQA: A Kindergarten-Level QA Dataset for Investigating Truthfulness. EasyQA is a GPT-3.5-turbo-generated dataset of easy kindergarten-level facts, meant to be used to prompt and evaluate large language models for "common-sense" truthful responses. This dataset was originally created to understand how different types of truthfulness may be represented in the intermediate activations of large language models. EasyQA compromises 2346 questions that span 50 categories, including art, technology, education, music, and animals. The questions are meant to be extremely simple and obvious, eliciting an obvious truth that would not be susceptible to misconceptions -- making it an excellent comparison compared to benchmarks related to other types of truth (e.g. TruthfulQA, which focuses on common misconceptions). Credits to Kevin Wang, Richard Ren, and Phillip Guo. ## Dataset Creation The dataset was created by prompting GPT-3.5-turbo with: "*Please generate 50 easy, obvious, common-knowledge questions that a kindergartener would learn in class about the topic prompted, as well as correct and incorrect responses. These questions should be less like trivia questions (i.e. Who is known as the Queen of Jazz?) and more like obvious facts (ie What color is the sky?). Your generations should be in the format: Question: {Your question here} Right: {Right answer} Wrong: {Wrong answer} where each question is a new line. Please follow this format verbatim (e.g. do not number the questions).*" The following categories were used: ``` Animals Plants Food and drink Music Movies Television shows Literature Sports Geography History Science Mathematics Art Technology Politics Business and Economy Education Health and Fitness Environment and Climate Space and Astronomy Fashion and Style Video Games Travel and Tourism Language and Literature Religion and Spirituality Famous Personalities Cultural Events/Festivals Cars and Automobiles Photography Architecture Medicine and Health Psychology Philosophy Law Social Sciences Human Rights Current Events/News Global Affairs National Landmarks Celebrities and Entertainment Nature Cooking and Baking Gardening DIY Projects Dance Comic Books and Graphic Novels Mythology and Folklore Internet and Social Media Parenting and Family Life Home Decor ```

提供机构：

notrichardren

原始信息汇总

数据集概述

数据集名称

EasyQA: A Kindergarten-Level QA Dataset for Investigating Truthfulness

数据集目的

用于评估大型语言模型对“常识”真实性响应的能力。
旨在理解不同类型的真实性如何在大型语言模型的中间激活中表现。

数据集内容

包含2346个问题，涵盖50个类别，如艺术、技术、教育、音乐和动物等。
问题设计为极其简单和明显，以引出不易产生误解的明显事实。

数据集创建

通过向GPT-3.5-turbo发出特定提示生成，要求生成50个简单、明显的常识问题，每个问题包含正确和错误答案。

数据集类别

包括动物、植物、食物和饮料、音乐等50个类别。

数据集语言

英语（en）

许可证

Apache-2.0

5,000+

优质数据集

54 个

任务类型

进入经典数据集