OpenBookQA

Opencsg2024-03-28 更新2024-06-22 收录

下载链接：

https://www.opencsg.com/datasets/OpenDataLab/OpenBookQA

下载链接

链接失效反馈

官方服务：

资源简介：

“OpenBookQA 是一种新的问答数据集，它以开卷考试为模型，用于评估人类对学科的理解。它由 5,957 个多项选择的初级科学问题（4,957 个训练，500 个开发，500 个测试）组成，它探讨了对 1,326 个核心科学事实的小“书”的理解以及这些事实在新情况中的应用。对于训练，数据集包括从每个问题到它旨在探索的核心科学事实的映射。回答 OpenBookQA 问题需要书中未包含的其他广泛的常识。这些问题在设计上会被基于检索的算法和单词共现算法错误地回答。此外，数据集包括 5,167 个众包常识的集合事实，以及训练/开发/测试问题的扩展版本，其中每个问题都与其原始核心事实、人类准确性分数、清晰度分数和匿名人群相关联rker ID。”

OpenBookQA is a novel question answering dataset modeled after open-book exams, developed to evaluate human subject understanding. It consists of 5,957 multiple-choice middle school science questions, split into 4,957 training samples, 500 development samples, and 500 test samples. The dataset explores comprehension of a compact "book" containing 1,326 core scientific facts, as well as the application of these facts in novel scenarios. For the training split, the dataset provides mappings from each question to the core scientific facts it targets. Answering OpenBookQA questions requires additional broad common knowledge not included in this core fact book. These questions are intentionally crafted to be incorrectly answered by retrieval-based algorithms and word co-occurrence based models. Furthermore, the dataset includes a corpus of 5,167 crowdsourced common knowledge facts, alongside expanded versions of the training, development, and test questions. Each expanded question is paired with its original core facts, human accuracy scores, clarity scores, and anonymous worker IDs.

创建时间：

2024-03-28

搜集汇总

数据集介绍

背景与挑战

背景概述

OpenBookQA是一个包含5,957个多项选择科学问题的问答数据集，旨在评估对1,326个核心科学事实的理解及其在新情境中的应用。该数据集设计上挑战基于检索和单词共现的算法，并包含额外的常识事实和问题扩展版本，适用于问答和文本分类任务。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集