notrichardren/easy_qa
收藏Hugging Face2023-06-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/notrichardren/easy_qa
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- question-answering
language:
- en
pretty_name: Easy Question Answer
---
# EasyQA: A Kindergarten-Level QA Dataset for Investigating Truthfulness.
EasyQA is a GPT-3.5-turbo-generated dataset of easy kindergarten-level facts, meant to be used to prompt and evaluate large language models for "common-sense" truthful responses. This dataset was originally created to understand how different types of truthfulness may be represented in the intermediate activations of large language models. EasyQA compromises 2346 questions that span 50 categories, including art, technology, education, music, and animals. The questions are meant to be extremely simple and obvious, eliciting an obvious truth that would not be susceptible to misconceptions -- making it an excellent comparison compared to benchmarks related to other types of truth (e.g. TruthfulQA, which focuses on common misconceptions).
Credits to Kevin Wang, Richard Ren, and Phillip Guo.
## Dataset Creation
The dataset was created by prompting GPT-3.5-turbo with: "*Please generate 50 easy, obvious, common-knowledge questions that a kindergartener would learn in class about the topic prompted, as well as correct and incorrect responses. These questions should be less like trivia questions (i.e. Who is known as the Queen of Jazz?) and more like obvious facts (ie What color is the sky?). Your generations should be in the format: Question: {Your question here} Right: {Right answer} Wrong: {Wrong answer} where each question is a new line. Please follow this format verbatim (e.g. do not number the questions).*"
The following categories were used:
```
Animals
Plants
Food and drink
Music
Movies
Television shows
Literature
Sports
Geography
History
Science
Mathematics
Art
Technology
Politics
Business and Economy
Education
Health and Fitness
Environment and Climate
Space and Astronomy
Fashion and Style
Video Games
Travel and Tourism
Language and Literature
Religion and Spirituality
Famous Personalities
Cultural Events/Festivals
Cars and Automobiles
Photography
Architecture
Medicine and Health
Psychology
Philosophy
Law
Social Sciences
Human Rights
Current Events/News
Global Affairs
National Landmarks
Celebrities and Entertainment
Nature
Cooking and Baking
Gardening
DIY Projects
Dance
Comic Books and Graphic Novels
Mythology and Folklore
Internet and Social Media
Parenting and Family Life
Home Decor
```
提供机构:
notrichardren
原始信息汇总
数据集概述
数据集名称
- EasyQA: A Kindergarten-Level QA Dataset for Investigating Truthfulness
数据集目的
- 用于评估大型语言模型对“常识”真实性响应的能力。
- 旨在理解不同类型的真实性如何在大型语言模型的中间激活中表现。
数据集内容
- 包含2346个问题,涵盖50个类别,如艺术、技术、教育、音乐和动物等。
- 问题设计为极其简单和明显,以引出不易产生误解的明显事实。
数据集创建
- 通过向GPT-3.5-turbo发出特定提示生成,要求生成50个简单、明显的常识问题,每个问题包含正确和错误答案。
数据集类别
- 包括动物、植物、食物和饮料、音乐等50个类别。
数据集语言
- 英语(en)
许可证
- Apache-2.0



