allenai/quoref

Name: allenai/quoref
Creator: allenai
Published: 2024-01-18 11:14:21
License: 暂无描述

Hugging Face2024-01-18 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/allenai/quoref

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - crowdsourced language: - en language_creators: - found license: - cc-by-4.0 multilinguality: - monolingual pretty_name: Quoref size_categories: - 10K<n<100K source_datasets: - original task_categories: - question-answering task_ids: [] paperswithcode_id: quoref tags: - coreference-resolution dataset_info: features: - name: id dtype: string - name: question dtype: string - name: context dtype: string - name: title dtype: string - name: url dtype: string - name: answers sequence: - name: answer_start dtype: int32 - name: text dtype: string splits: - name: train num_bytes: 44377729 num_examples: 19399 - name: validation num_bytes: 5442031 num_examples: 2418 download_size: 5078438 dataset_size: 49819760 --- # Dataset Card for "quoref" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://allenai.org/data/quoref - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Paper:** [Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning](https://aclanthology.org/D19-1606/) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Size of downloaded dataset files:** 5.08 MB - **Size of the generated dataset:** 49.82 MB - **Total amount of disk used:** 54.90 MB ### Dataset Summary Quoref is a QA dataset which tests the coreferential reasoning capability of reading comprehension systems. In this span-selection benchmark containing 24K questions over 4.7K paragraphs from Wikipedia, a system must resolve hard coreferences before selecting the appropriate span(s) in the paragraphs for answering questions. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Dataset Structure ### Data Instances #### default - **Size of downloaded dataset files:** 5.08 MB - **Size of the generated dataset:** 49.82 MB - **Total amount of disk used:** 54.90 MB An example of 'validation' looks as follows. ``` This example was too long and was cropped: { "answers": { "answer_start": [1633], "text": ["Frankie"] }, "context": "\"Frankie Bono, a mentally disturbed hitman from Cleveland, comes back to his hometown in New York City during Christmas week to ...", "id": "bfc3b34d6b7e73c0bd82a009db12e9ce196b53e6", "question": "What is the first name of the person who has until New Year's Eve to perform a hit?", "title": "Blast of Silence", "url": "https://en.wikipedia.org/wiki/Blast_of_Silence" } ``` ### Data Fields The data fields are the same among all splits. #### default - `id`: a `string` feature. - `question`: a `string` feature. - `context`: a `string` feature. - `title`: a `string` feature. - `url`: a `string` feature. - `answers`: a dictionary feature containing: - `answer_start`: a `int32` feature. - `text`: a `string` feature. ### Data Splits | name |train|validation| |-------|----:|---------:| |default|19399| 2418| ## Dataset Creation ### Curation Rationale [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Source Data #### Initial Data Collection and Normalization [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the source language producers? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Annotations #### Annotation process [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the annotators? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Personal and Sensitive Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Discussion of Biases [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Other Known Limitations [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Additional Information ### Dataset Curators [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Licensing Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Citation Information ``` @article{allenai:quoref, author = {Pradeep Dasigi and Nelson F. Liu and Ana Marasovic and Noah A. Smith and Matt Gardner}, title = {Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning}, journal = {arXiv:1908.05803v2 }, year = {2019}, } ``` ### Contributions Thanks to [@lewtun](https://github.com/lewtun), [@patrickvonplaten](https://github.com/patrickvonplaten), [@thomwolf](https://github.com/thomwolf) for adding this dataset.

annotations_creators: - 众包 language: - 英语 language_creators: - 现有文本采集 license: - 知识共享署名4.0（CC-BY-4.0） multilinguality: - 单语言 pretty_name: Quoref size_categories: - 10000 < 样本数 < 100000 source_datasets: - 原创数据集 task_categories: - 问答（question-answering） task_ids: - 无 paperswithcode_id: quoref tags: - 共指消解（coreferential-resolution） dataset_info: features: - name: id dtype: 字符串 - name: question dtype: 字符串 - name: context dtype: 字符串 - name: title dtype: 字符串 - name: url dtype: 字符串 - name: answers sequence: - name: answer_start dtype: int32 - name: text dtype: 字符串 splits: - name: train num_bytes: 44377729 num_examples: 19399 - name: validation num_bytes: 5442031 num_examples: 2418 download_size: 5078438 dataset_size: 49819760 # 数据集卡片：Quoref ## 目录 - [数据集描述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持任务与基准排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [注释流程](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献者](#contributions) ## 数据集描述 - **主页**：https://allenai.org/data/quoref - **仓库**：[更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **论文**：[Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning](https://aclanthology.org/D19-1606/) - **联系方式**：[更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **下载数据集文件大小**：5.08 MB - **生成后的数据集大小**：49.82 MB - **总磁盘占用空间**：54.90 MB ### 数据集摘要 Quoref是一款用于评估阅读理解系统共指推理能力的问答（QA）数据集。该跨度选择基准测试集包含来自维基百科的4.7千段文本，共衍生出2.4万个问题，系统需先解决复杂共指问题，才能在段落中选取恰当的跨度来回答对应问题。 ### 支持任务与基准排行榜 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 语言 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 数据集结构 ### 数据实例 #### 默认配置 - **下载数据集文件大小**：5.08 MB - **生成后的数据集大小**：49.82 MB - **总磁盘占用空间**：54.90 MB 以下是`validation`划分的一个示例（内容过长已裁剪）： { "answers": { "answer_start": [1633], "text": ["Frankie"] }, "context": ""Frankie Bono, a mentally disturbed hitman from Cleveland, comes back to his hometown in New York City during Christmas week to ...", "id": "bfc3b34d6b7e73c0bd82a009db12e9ce196b53e6", "question": "What is the first name of the person who has until New Year's Eve to perform a hit?", "title": "Blast of silence", "url": "https://en.wikipedia.org/wiki/Blast_of_Silence" } ### 数据字段所有划分的数据字段格式一致。 #### 默认配置 - `id`：字符串类型特征。 - `question`：字符串类型特征。 - `context`：字符串类型特征。 - `title`：字符串类型特征。 - `url`：字符串类型特征。 - `answers`：字典类型特征，包含： - `answer_start`：int32类型特征。 - `text`：字符串类型特征。 ### 数据划分 | 划分名称 | 训练集样本数 | 验证集样本数 | |---------|-------------:|------------:| | 默认配置 | 19399 | 2418 | ## 数据集构建 ### 构建初衷 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 源数据 #### 初始数据收集与标准化 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### 源语言数据生产者 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 注释流程 #### 注释过程 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### 注释者 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 个人与敏感信息 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 数据集使用注意事项 ### 数据集的社会影响 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 偏差讨论 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 其他已知局限 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 附加信息 ### 数据集维护者 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 许可信息 [更多信息请参见](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 引用信息 @article{allenai:quoref, author = {Pradeep Dasigi and Nelson F. Liu and Ana Marasovic and Noah A. Smith and Matt Gardner}, title = {Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning}, journal = {arXiv:1908.05803v2 }, year = {2019}, } ### 贡献者感谢[@lewtun](https://github.com/lewtun)、[@patrickvonplaten](https://github.com/patrickvonplaten)、[@thomwolf](https://github.com/thomwolf) 添加此数据集。

提供机构：

allenai

原始信息汇总

数据集概述

名称: Quoref

语言: 英语 (en)

许可证: CC-BY-4.0

多语言性: 单语

大小: 10K<n<100K

来源: 原始数据

任务类别: 问答

标签: 指代消解

数据集结构

数据实例

id: 字符串
question: 字符串
context: 字符串
title: 字符串
url: 字符串
answers: 字典，包含:
- answer_start: 整数
- text: 字符串

数据分割

名称	训练	验证
默认	19399	2418

数据集创建

注释创建者: 众包

论文: Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning

引用信息:

@article{allenai:quoref, author = {Pradeep Dasigi and Nelson F. Liu and Ana Marasovic and Noah A. Smith and Matt Gardner}, title = {Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning}, journal = {arXiv:1908.05803v2 }, year = {2019}, }

搜集汇总

数据集介绍

背景与挑战

背景概述

Quoref是一个英语问答数据集，包含24K问题和4.7K维基百科段落，专门用于测试阅读理解系统的共指推理能力。该数据集采用CC-BY-4.0许可证，主要任务为跨度选择基准测试，需要系统在回答问题前解决困难的共指问题。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集