wisenut-nlp-team/squad_kor_v1
收藏Hugging Face2023-08-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/wisenut-nlp-team/squad_kor_v1
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
language_creators:
- found
language:
- ko
license:
- cc-by-nd-4.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- question-answering
task_ids:
- extractive-qa
paperswithcode_id: korquad
pretty_name: The Korean Question Answering Dataset
dataset_info:
features:
- name: id
dtype: string
- name: title
dtype: string
- name: context
dtype: string
- name: question
dtype: string
- name: answers
sequence:
- name: text
dtype: string
- name: answer_start
dtype: int32
config_name: squad_kor_v1_512
splits:
- name: train
num_examples: 60407
- name: validation
num_examples: 5774
viewer: true
---
# Dataset Card for KorQuAD v1.0 512 Tokens
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** https://korquad.github.io/KorQuad%201.0/
- **Repository:** https://github.com/korquad/korquad.github.io/tree/master/dataset
- **Paper:** https://arxiv.org/abs/1909.07005
### Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1).
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
[More Information Needed]
提供机构:
wisenut-nlp-team
原始信息汇总
数据集概述
数据集基本信息
- 名称: The Korean Question Answering Dataset
- 别名: KorQuAD v1.0 512 Tokens
- 语言: 韩语 (ko)
- 许可证: cc-by-nd-4.0
- 多语言性: 单语种
- 大小: 10K<n<100K
- 任务类别: 问答
- 任务ID: extractive-qa
- 论文代码ID: korquad
数据集结构
数据实例
- 特征:
id: 字符串title: 字符串context: 字符串question: 字符串answers: 序列text: 字符串answer_start: 整数 (int32)
数据分割
- 训练集: 60407 个样本
- 验证集: 5774 个样本
数据集创建
注释
- 创建者: 众包
- 语言创建者: 发现
源数据
- 来源: 原始数据
注意事项
- 使用考虑: 社会影响、偏见讨论、其他已知限制等信息未提供。
- 附加信息: 数据集管理员、许可信息、引用信息、贡献等信息未提供。



