kyle-obrien/multilingual-squad
收藏Hugging Face2023-08-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kyle-obrien/multilingual-squad
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
configs:
- config_name: default
data_files:
- split: en
path: data/en-*
- split: de
path: data/de-*
- split: es
path: data/es-*
- split: it
path: data/it-*
dataset_info:
features:
- name: id
dtype: string
- name: context
dtype: string
- name: question
dtype: string
- name: answers
sequence:
- name: text
dtype: string
- name: answer_start
dtype: int32
splits:
- name: en
num_bytes: 729684.8924369748
num_examples: 778
- name: de
num_bytes: 838982.7781512605
num_examples: 778
- name: es
num_bytes: 832574.4117647059
num_examples: 778
- name: it
num_bytes: 739646.5447480896
num_examples: 778
download_size: 729382
dataset_size: 3140888.627101031
---
# Dataset Card
### Dataset Summary
This dataset contains multilingual, parallel SQuAD dataset examples across EN, DE, ES, and IT. To construct the dataset, identifiers were aligned across the following SQuAD-related datasets:
* EN, DE, ES: [XQuAD (Cross-lingual Question Answering Dataset)](https://huggingface.co/datasets/xquad)
* IT: [SQuAD-it](https://huggingface.co/datasets/squad_it)
See citation information below.
### Citation Information
XQuAD:
```
@article{Artetxe:etal:2019,
author = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama},
title = {On the cross-lingual transferability of monolingual representations},
journal = {CoRR},
volume = {abs/1910.11856},
year = {2019},
archivePrefix = {arXiv},
eprint = {1910.11856}
}
```
SQuAD-it:
```
@InProceedings{10.1007/978-3-030-03840-3_29,
author="Croce, Danilo and Zelenanska, Alexandra and Basili, Roberto",
editor="Ghidini, Chiara and Magnini, Bernardo and Passerini, Andrea and Traverso, Paolo",
title="Neural Learning for Question Answering in Italian",
booktitle="AI*IA 2018 -- Advances in Artificial Intelligence",
year="2018",
publisher="Springer International Publishing",
address="Cham",
pages="389--402",
isbn="978-3-030-03840-3"
}
```
提供机构:
kyle-obrien
原始信息汇总
数据集概述
数据集信息
- 许可: cc-by-sa-4.0
- 配置:
- 默认配置:
- 数据文件:
- 语言: en, de, es, it
- 路径: data/en-, data/de-, data/es-, data/it-
- 数据文件:
- 默认配置:
- 数据集特征:
- id: 字符串类型
- context: 字符串类型
- question: 字符串类型
- answers:
- text: 字符串类型
- answer_start: 整数类型 (int32)
- 数据集分割:
- en: 778个示例, 729684.8924369748字节
- de: 778个示例, 838982.7781512605字节
- es: 778个示例, 832574.4117647059字节
- it: 778个示例, 739646.5447480896字节
- 下载大小: 729382字节
- 数据集大小: 3140888.627101031字节
数据集来源
- EN, DE, ES: XQuAD (Cross-lingual Question Answering Dataset)
- IT: SQuAD-it



