MajdTannous/Dataset2

Name: MajdTannous/Dataset2
Creator: MajdTannous
Published: 2023-10-21 08:29:39
License: 暂无描述

Hugging Face2023-10-21 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/MajdTannous/Dataset2

下载链接

链接失效反馈

官方服务：

资源简介：

斯坦福问答数据集（SQuAD）是一个阅读理解数据集，由众包工作者在维基百科文章上提出的问题组成，每个问题的答案是对应阅读段落中的一段文本，或者问题可能是无法回答的。数据集包含训练集和验证集，分别有87599和10570个样本。数据集的字段包括id、title、context、question和answers，其中answers包含text和answer_start两个子字段。

提供机构：

MajdTannous

原始信息汇总

数据集概述

数据集基本信息

名称: SQuAD
语言: 英语
许可证: CC BY 4.0
多语言性: 单语种
数据集大小: 10K<n<100K
源数据集: 扩展自 Wikipedia
任务类别: 问答系统
任务ID: 抽取式问答
PapersWithCode ID: squad

数据集结构

数据实例

配置: plain_text
示例: json { "answers": { "answer_start": [1], "text": ["This is a test text"] }, "context": "This is a test context.", "id": "1", "question": "Is this a test?", "title": "train test" }

数据字段

id: 字符串类型
title: 字符串类型
context: 字符串类型
question: 字符串类型
answers: 字典类型，包含:
- text: 字符串类型
- answer_start: 整数类型

数据分割

训练集: 87599 条数据
验证集: 10570 条数据

数据集创建

数据集特征

特征:
- id: 字符串类型
- title: 字符串类型
- context: 字符串类型
- question: 字符串类型
- answers: 序列类型，包含:
  - text: 字符串类型
  - answer_start: 整数类型

数据集大小

下载大小: 35.14 MB
生成大小: 89.92 MB
总磁盘使用量: 125.06 MB

引用信息

@article{2016arXiv160605250R, author = {{Rajpurkar}, Pranav and {Zhang}, Jian and {Lopyrev}, Konstantin and {Liang}, Percy}, title = "{SQuAD: 100,000+ Questions for Machine Comprehension of Text}", journal = {arXiv e-prints}, year = 2016, eid = {arXiv:1606.05250}, pages = {arXiv:1606.05250}, archivePrefix = {arXiv}, eprint = {1606.05250}, }

5,000+

优质数据集

54 个

任务类型

进入经典数据集