KETI-AIR/kor_squad_v2
收藏Hugging Face2023-11-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/KETI-AIR/kor_squad_v2
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ko
task_categories:
- question-answering
task_ids:
- open-domain-qa
- extractive-qa
license: cc-by-sa-4.0
dataset_info:
features:
- name: data_index_by_user
dtype: int32
- name: title
dtype: string
- name: context
dtype: string
- name: question
dtype: string
- name: answers
struct:
- name: text
sequence: string
- name: answer_start
sequence: int32
splits:
- name: train
num_bytes: 127009200
num_examples: 130319
- name: validation
num_bytes: 12410838
num_examples: 11873
download_size: 19086608
dataset_size: 139420038
---
# Dataset Card for squad_v2
## Licensing Information
The data is distributed under the [CC BY SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en) license.
## Source Data Citation Information
```
@article{2016arXiv160605250R,
author = {{Rajpurkar}, Pranav and {Zhang}, Jian and {Lopyrev},
Konstantin and {Liang}, Percy},
title = "{SQuAD: 100,000+ Questions for Machine Comprehension of Text}",
journal = {arXiv e-prints},
year = 2016,
eid = {arXiv:1606.05250},
pages = {arXiv:1606.05250},
archivePrefix = {arXiv},
eprint = {1606.05250},
}
提供机构:
KETI-AIR
原始信息汇总
数据集概述
基本信息
- 语言: 韩语 (ko)
- 任务类别: 问答 (question-answering)
- 任务ID:
- 开放领域问答 (open-domain-qa)
- 抽取式问答 (extractive-qa)
- 许可证: CC BY SA 4.0
数据集结构
特征
- data_index_by_user: 数据类型为 int32
- title: 数据类型为 string
- context: 数据类型为 string
- question: 数据类型为 string
- answers: 结构类型,包含以下字段:
- text: 序列类型,数据类型为 string
- answer_start: 序列类型,数据类型为 int32
数据分割
- 训练集 (train):
- 字节数: 127009200
- 样本数: 130319
- 验证集 (validation):
- 字节数: 12410838
- 样本数: 11873
数据集大小
- 下载大小: 19086608 字节
- 数据集大小: 139420038 字节



