SQuAD2.0

Name: SQuAD2.0
Creator: Kaggle
Published: 2022-11-20 00:00:00
License: 暂无描述

www.kaggle.com2022-11-20 更新2025-01-21 收录

下载链接：

https://www.kaggle.com/thedevastator/squad2-0-a-challenge-for-question-answering-syst

下载链接

链接失效反馈

官方服务：

资源简介：

# SQuAD2.0 ### Adversarial questions & answers that look similar to answerable ones _____ ### Source > **Huggingface Hub:** [link](https://huggingface.co/datasets/squad_v2) ### About this dataset > combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. ### Research Ideas > - The SQuAD dataset can be used to train a machine learning model to automatically generate answers to questions. > - The SQuAD dataset can be used to train a machine learning model to automatically generate questions based on a given context. > - The SQuAD dataset can be used to improve the accuracy of existing question answering systems ### Acknowledgements > The SQuAD2.0 dataset was created by the Stanford Question Answering Dataset (SQuAD) team at Stanford University. > > The dataset is based on a set of documents from Wikipedia. The full text of each document is provided, along with human-generated questions about the document and corresponding answers > > > ### License > > > > **License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/)** > > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](https://creativecommons.org/publicdomain/zero/1.0/). ### Columns **File: validation.csv** | Column name | Description | |:--------------|:----------------------------------------------------| | **title** | The title of the Wikipedia article. (String) | | **context** | The full text of the Wikipedia article. (String) | | **question** | The question that the model will be asked. (String) | | **answers** | The answer to the question. (String) | _____ **File: train.csv** | Column name | Description | |:--------------|:----------------------------------------------------| | **title** | The title of the Wikipedia article. (String) | | **context** | The full text of the Wikipedia article. (String) | | **question** | The question that the model will be asked. (String) | | **answers** | The answer to the question. (String) |

{'# SQuAD2.0': '# SQuAD2.0', 'Adversarial questions & answers that look similar to answerable ones': '具有与可解答问题相似外观的对抗性问题与答案', 'Source': '### 源起', '> **Huggingface Hub:** [link](https://huggingface.co/datasets/squad_v2)': '> **Hugging Face 数据集库:** [链接](https://huggingface.co/datasets/squad_v2)', '### About this dataset': '### 数据集概述', '> combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.': '> 本数据集将 SQuAD1.1 中的 10 万个问题与 5 万余个由众包工作者撰写、旨在模仿可解答问题的对抗性问题相结合。要在 SQuAD2.0 中取得优异成绩，系统不仅应在可能的情况下回答问题，还须识别段落中不支持答案的情况并予以回避。', '### Research Ideas': '### 研究思路', '> - The SQuAD dataset can be used to train a machine learning model to automatically generate answers to questions.': '> - SQuAD 数据集可用于训练机器学习模型，以自动生成问题的答案。', '> - The SQuAD dataset can be used to train a machine learning model to automatically generate questions based on a given context.': '> - SQuAD 数据集可用于训练机器学习模型，基于既定语境自动生成问题。', '> - The SQuAD dataset can be used to improve the accuracy of existing question answering systems': '> - SQuAD 数据集可用于提升现有问答系统的准确性', '### Acknowledgements': '### 致谢', '> The SQuAD2.0 dataset was created by the Stanford Question Answering Dataset (SQuAD) team at Stanford University.': '> SQuAD2.0 数据集由斯坦福大学斯坦福问答数据集（SQuAD）团队创建。', '> The dataset is based on a set of documents from Wikipedia. The full text of each document is provided, along with human-generated questions about the document and corresponding answers': '> 该数据集基于来自维基百科的一系列文档。每个文档的全文均予以提供，并附有关于文档的人造问题和相应的答案。', '### License': '### 许可协议', '> **License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/)**': '> **许可协议：[CC0 1.0 通用 (CC0 1.0) - 公共领域奉献](https://creativecommons.org/publicdomain/zero/1.0/)**', '> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](https://creativecommons.org/publicdomain/zero/1.0/)': '> 无版权 - 您可以复制、修改、分发和表演该作品，包括商业用途，而无需征求许可。[查看更多信息](https://creativecommons.org/publicdomain/zero/1.0/)', '### Columns': '### 字段说明', '**File: validation.csv**': '**文件：validation.csv**', '| Column name | Description |': '| 字段名称 | 描述 |', '|:--------------|:----------------------------------------------------|': '|:--------------|:----------------------------------------------------|', '| **title** | The title of the Wikipedia article. (String) |': '| **title** | 维基百科文章的标题。 (字符串) |', '| **context** | The full text of the Wikipedia article. (String) |': '| **context** | 维基百科文章的全文。 (字符串) |', '| **question** | The question that the model will be asked. (String) |': '| **question** | 模型将被询问的问题。 (字符串) |', '| **answers** | The answer to the question. (String) |': '| **answers** | 问题的答案。 (字符串) |', '| **File: train.csv**': '| **文件：train.csv**'}

提供机构：

Kaggle

5,000+

优质数据集

54 个

任务类型

进入经典数据集