xquad
收藏魔搭社区2026-01-06 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/google/xquad
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for "xquad"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [https://github.com/deepmind/xquad](https://github.com/deepmind/xquad)
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Size of downloaded dataset files:** 146.31 MB
- **Size of the generated dataset:** 18.97 MB
- **Total amount of disk used:** 165.28 MB
### Dataset Summary
XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering
performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set
of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German,
Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel
across 11 languages.
### Supported Tasks and Leaderboards
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Languages
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Dataset Structure
### Data Instances
#### xquad.ar
- **Size of downloaded dataset files:** 13.30 MB
- **Size of the generated dataset:** 1.72 MB
- **Total amount of disk used:** 15.03 MB
An example of 'validation' looks as follows.
```
This example was too long and was cropped:
{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
```
#### xquad.de
- **Size of downloaded dataset files:** 13.30 MB
- **Size of the generated dataset:** 1.29 MB
- **Total amount of disk used:** 14.59 MB
An example of 'validation' looks as follows.
```
This example was too long and was cropped:
{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
```
#### xquad.el
- **Size of downloaded dataset files:** 13.30 MB
- **Size of the generated dataset:** 2.21 MB
- **Total amount of disk used:** 15.51 MB
An example of 'validation' looks as follows.
```
This example was too long and was cropped:
{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
```
#### xquad.en
- **Size of downloaded dataset files:** 13.30 MB
- **Size of the generated dataset:** 1.12 MB
- **Total amount of disk used:** 14.42 MB
An example of 'validation' looks as follows.
```
This example was too long and was cropped:
{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
```
#### xquad.es
- **Size of downloaded dataset files:** 13.30 MB
- **Size of the generated dataset:** 1.28 MB
- **Total amount of disk used:** 14.58 MB
An example of 'validation' looks as follows.
```
This example was too long and was cropped:
{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
```
### Data Fields
The data fields are the same among all splits.
#### xquad.ar
- `id`: a `string` feature.
- `context`: a `string` feature.
- `question`: a `string` feature.
- `answers`: a dictionary feature containing:
- `text`: a `string` feature.
- `answer_start`: a `int32` feature.
#### xquad.de
- `id`: a `string` feature.
- `context`: a `string` feature.
- `question`: a `string` feature.
- `answers`: a dictionary feature containing:
- `text`: a `string` feature.
- `answer_start`: a `int32` feature.
#### xquad.el
- `id`: a `string` feature.
- `context`: a `string` feature.
- `question`: a `string` feature.
- `answers`: a dictionary feature containing:
- `text`: a `string` feature.
- `answer_start`: a `int32` feature.
#### xquad.en
- `id`: a `string` feature.
- `context`: a `string` feature.
- `question`: a `string` feature.
- `answers`: a dictionary feature containing:
- `text`: a `string` feature.
- `answer_start`: a `int32` feature.
#### xquad.es
- `id`: a `string` feature.
- `context`: a `string` feature.
- `question`: a `string` feature.
- `answers`: a dictionary feature containing:
- `text`: a `string` feature.
- `answer_start`: a `int32` feature.
### Data Splits
| name | validation |
| -------- | ---------: |
| xquad.ar | 1190 |
| xquad.de | 1190 |
| xquad.el | 1190 |
| xquad.en | 1190 |
| xquad.es | 1190 |
## Dataset Creation
### Curation Rationale
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the source language producers?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Annotations
#### Annotation process
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the annotators?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Personal and Sensitive Information
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Discussion of Biases
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Other Known Limitations
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Additional Information
### Dataset Curators
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Licensing Information
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Citation Information
```
@article{Artetxe:etal:2019,
author = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama},
title = {On the cross-lingual transferability of monolingual representations},
journal = {CoRR},
volume = {abs/1910.11856},
year = {2019},
archivePrefix = {arXiv},
eprint = {1910.11856}
}
```
### Contributions
Thanks to [@lewtun](https://github.com/lewtun), [@patrickvonplaten](https://github.com/patrickvonplaten), [@thomwolf](https://github.com/thomwolf) for adding this dataset.
# 数据集卡片:"XQuAD"
## 目录
- [数据集概述](#dataset-description)
- [数据集摘要](#dataset-summary)
- [支持任务与排行榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [标注信息](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [授权信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献者](#contributions)
## 数据集概述
- **主页**:[https://github.com/deepmind/xquad](https://github.com/deepmind/xquad)
- **代码仓库**:[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **相关论文**:[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **联系人**:[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **下载数据集文件大小**:146.31 MB
- **生成数据集大小**:18.97 MB
- **总磁盘占用**:165.28 MB
### 数据集摘要
XQuAD(跨语言问答数据集,Cross-lingual Question Answering Dataset)是用于评估跨语言问答性能的基准测试数据集。该数据集从SQuAD v1.1(Rajpurkar等人,2016)的开发集中选取了240段文本与1190组问答对,并将其专业翻译为10种语言:西班牙语、德语、希腊语、俄语、土耳其语、阿拉伯语、越南语、泰语、汉语与印地语。因此,该数据集在11种语言间完全对齐。
### 支持任务与排行榜
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 语言
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 数据集结构
### 数据实例
#### xquad.ar
- **下载数据集文件大小**:13.30 MB
- **生成数据集大小**:1.72 MB
- **总磁盘占用**:15.03 MB
"验证集"的一个示例如下:
This example was too long and was cropped:
{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": ""Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
#### xquad.de
- **下载数据集文件大小**:13.30 MB
- **生成数据集大小**:1.29 MB
- **总磁盘占用**:14.59 MB
"验证集"的一个示例如下:
This example was too long and was cropped:
{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": ""Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
#### xquad.el
- **下载数据集文件大小**:13.30 MB
- **生成数据集大小**:2.21 MB
- **总磁盘占用**:15.51 MB
"验证集"的一个示例如下:
This example was too long and was cropped:
{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": ""Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
#### xquad.en
- **下载数据集文件大小**:13.30 MB
- **生成数据集大小**:1.12 MB
- **总磁盘占用**:14.42 MB
"验证集"的一个示例如下:
This example was too long and was cropped:
{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": ""Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
#### xquad.es
- **下载数据集文件大小**:13.30 MB
- **生成数据集大小**:1.28 MB
- **总磁盘占用**:14.58 MB
"验证集"的一个示例如下:
This example was too long and was cropped:
{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": ""Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
### 数据字段
所有划分的数据字段均保持一致。
#### xquad.ar
- `id`:字符串类型特征。
- `context`:字符串类型特征。
- `question`:字符串类型特征。
- `answers`:字典类型特征,包含:
- `text`:字符串类型特征。
- `answer_start`:int32类型特征。
#### xquad.de
- `id`:字符串类型特征。
- `context`:字符串类型特征。
- `question`:字符串类型特征。
- `answers`:字典类型特征,包含:
- `text`:字符串类型特征。
- `answer_start`:int32类型特征。
#### xquad.el
- `id`:字符串类型特征。
- `context`:字符串类型特征。
- `question`:字符串类型特征。
- `answers`:字典类型特征,包含:
- `text`:字符串类型特征。
- `answer_start`:int32类型特征。
#### xquad.en
- `id`:字符串类型特征。
- `context`:字符串类型特征。
- `question`:字符串类型特征。
- `answers`:字典类型特征,包含:
- `text`:字符串类型特征。
- `answer_start`:int32类型特征。
#### xquad.es
- `id`:字符串类型特征。
- `context`:字符串类型特征。
- `question`:字符串类型特征。
- `answers`:字典类型特征,包含:
- `text`:字符串类型特征。
- `answer_start`:int32类型特征。
### 数据划分
| 数据集名称 | 验证集样本数 |
| -------- | ---------: |
| xquad.ar | 1190 |
| xquad.de | 1190 |
| xquad.el | 1190 |
| xquad.en | 1190 |
| xquad.es | 1190 |
## 数据集构建
### 构建初衷
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 源数据
#### 初始数据收集与标准化
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### 源语言生产者是谁?
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 标注信息
#### 标注流程
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### 标注者是谁?
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 个人与敏感信息
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 数据使用注意事项
### 数据集的社会影响
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 偏差讨论
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 其他已知局限性
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 附加信息
### 数据集维护者
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 授权信息
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 引用信息
@article{Artetxe:etal:2019,
author = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama},
title = {On the cross-lingual transferability of monolingual representations},
journal = {CoRR},
volume = {abs/1910.11856},
year = {2019},
archivePrefix = {arXiv},
eprint = {1910.11856}
}
### 贡献者
感谢[@lewtun](https://github.com/lewtun)、[@patrickvonplaten](https://github.com/patrickvonplaten)与[@thomwolf](https://github.com/thomwolf)添加该数据集。
提供机构:
maas
创建时间:
2025-04-21



