stanfordnlp/wikitablequestions
收藏Hugging Face2024-01-18 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/stanfordnlp/wikitablequestions
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
language_creators:
- found
language:
- en
license:
- cc-by-4.0
multilinguality:
- monolingual
paperswithcode_id: null
pretty_name: WikiTableQuestions
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- question-answering
task_ids: []
tags:
- table-question-answering
dataset_info:
- config_name: random-split-1
features:
- name: id
dtype: string
- name: question
dtype: string
- name: answers
sequence: string
- name: table
struct:
- name: header
sequence: string
- name: rows
sequence:
sequence: string
- name: name
dtype: string
splits:
- name: train
num_bytes: 30364389
num_examples: 11321
- name: test
num_bytes: 11423506
num_examples: 4344
- name: validation
num_bytes: 7145768
num_examples: 2831
download_size: 29267445
dataset_size: 48933663
- config_name: random-split-2
features:
- name: id
dtype: string
- name: question
dtype: string
- name: answers
sequence: string
- name: table
struct:
- name: header
sequence: string
- name: rows
sequence:
sequence: string
- name: name
dtype: string
splits:
- name: train
num_bytes: 30098954
num_examples: 11314
- name: test
num_bytes: 11423506
num_examples: 4344
- name: validation
num_bytes: 7411203
num_examples: 2838
download_size: 29267445
dataset_size: 48933663
- config_name: random-split-3
features:
- name: id
dtype: string
- name: question
dtype: string
- name: answers
sequence: string
- name: table
struct:
- name: header
sequence: string
- name: rows
sequence:
sequence: string
- name: name
dtype: string
splits:
- name: train
num_bytes: 28778697
num_examples: 11314
- name: test
num_bytes: 11423506
num_examples: 4344
- name: validation
num_bytes: 8731460
num_examples: 2838
download_size: 29267445
dataset_size: 48933663
- config_name: random-split-4
features:
- name: id
dtype: string
- name: question
dtype: string
- name: answers
sequence: string
- name: table
struct:
- name: header
sequence: string
- name: rows
sequence:
sequence: string
- name: name
dtype: string
splits:
- name: train
num_bytes: 30166421
num_examples: 11321
- name: test
num_bytes: 11423506
num_examples: 4344
- name: validation
num_bytes: 7343736
num_examples: 2831
download_size: 29267445
dataset_size: 48933663
- config_name: random-split-5
features:
- name: id
dtype: string
- name: question
dtype: string
- name: answers
sequence: string
- name: table
struct:
- name: header
sequence: string
- name: rows
sequence:
sequence: string
- name: name
dtype: string
splits:
- name: train
num_bytes: 30333964
num_examples: 11316
- name: test
num_bytes: 11423506
num_examples: 4344
- name: validation
num_bytes: 7176193
num_examples: 2836
download_size: 29267445
dataset_size: 48933663
---
# Dataset Card for WikiTableQuestions
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-instances)
- [Data Splits](#data-instances)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
## Dataset Description
- **Homepage:** [WikiTableQuestions homepage](https://nlp.stanford.edu/software/sempre/wikitable)
- **Repository:** [WikiTableQuestions repository](https://github.com/ppasupat/WikiTableQuestions)
- **Paper:** [Compositional Semantic Parsing on Semi-Structured Tables](https://arxiv.org/abs/1508.00305)
- **Leaderboard:** [WikiTableQuestions leaderboard on PaperWithCode](https://paperswithcode.com/dataset/wikitablequestions)
- **Point of Contact:** [Needs More Information]
### Dataset Summary
The WikiTableQuestions dataset is a large-scale dataset for the task of question answering on semi-structured tables.
### Supported Tasks and Leaderboards
question-answering, table-question-answering
### Languages
en
## Dataset Structure
### Data Instances
#### default
- **Size of downloaded dataset files:** 29.27 MB
- **Size of the generated dataset:** 47.90 MB
- **Total amount of disk used:** 77.18 MB
An example of 'validation' looks as follows:
```
{
"id": "nt-0",
"question": "what was the last year where this team was a part of the usl a-league?",
"answers": ["2004"],
"table": {
"header": ["Year", "Division", "League", ...],
"name": "csv/204-csv/590.csv",
"rows": [
["2001", "2", "USL A-League", ...],
["2002", "2", "USL A-League", ...],
...
]
}
}
```
### Data Fields
The data fields are the same among all splits.
#### default
- `id`: a `string` feature.
- `question`: a `string` feature.
- `answers`: a `list` of `string` feature.
- `table`: a dictionary feature containing:
- `header`: a `list` of `string` features.
- `rows`: a `list` of `list` of `string` features:
- `name`: a `string` feature.
### Data Splits
| name |train|validation|test |
|-------|----:|---------:|----:|
|default|11321| 2831|4344|
## Dataset Creation
### Curation Rationale
[Needs More Information]
### Source Data
#### Initial Data Collection and Normalization
[Needs More Information]
#### Who are the source language producers?
[Needs More Information]
### Annotations
#### Annotation process
[Needs More Information]
#### Who are the annotators?
[Needs More Information]
### Personal and Sensitive Information
[Needs More Information]
## Considerations for Using the Data
### Social Impact of Dataset
[Needs More Information]
### Discussion of Biases
[Needs More Information]
### Other Known Limitations
[Needs More Information]
## Additional Information
### Dataset Curators
Panupong Pasupat and Percy Liang
### Licensing Information
Creative Commons Attribution Share Alike 4.0 International
### Citation Information
```
@inproceedings{pasupat-liang-2015-compositional,
title = "Compositional Semantic Parsing on Semi-Structured Tables",
author = "Pasupat, Panupong and Liang, Percy",
booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = jul,
year = "2015",
address = "Beijing, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P15-1142",
doi = "10.3115/v1/P15-1142",
pages = "1470--1480",
}
```
### Contributions
Thanks to [@SivilTaram](https://github.com/SivilTaram) for adding this dataset.
annotations_creators:
- 众包
language_creators:
- 发现式
language:
- 英语
license:
- CC-BY-4.0
multilinguality:
- 单语种
paperswithcode_id: 无
pretty_name: WikiTableQuestions
size_categories:
- 10K<n<100K
source_datasets:
- 原创
task_categories:
- 问答
task_ids: []
tags:
- 表格问答
dataset_info:
- config_name: random-split-1
features:
- name: id
dtype: 字符串
- name: question
dtype: 字符串
- name: answers
sequence: 字符串
- name: table
struct:
- name: header
sequence: 字符串
- name: rows
sequence:
sequence: 字符串
- name: name
dtype: 字符串
splits:
- name: train
num_bytes: 30364389
num_examples: 11321
- name: test
num_bytes: 11423506
num_examples: 4344
- name: validation
num_bytes: 7145768
num_examples: 2831
download_size: 29267445
dataset_size: 48933663
- config_name: random-split-2
features:
- name: id
dtype: 字符串
- name: question
dtype: 字符串
- name: answers
sequence: 字符串
- name: table
struct:
- name: header
sequence: 字符串
- name: rows
sequence:
sequence: 字符串
- name: name
dtype: 字符串
splits:
- name: train
num_bytes: 30098954
num_examples: 11314
- name: test
num_bytes: 11423506
num_examples: 4344
- name: validation
num_bytes: 7411203
num_examples: 2838
download_size: 29267445
dataset_size: 48933663
- config_name: random-split-3
features:
- name: id
dtype: 字符串
- name: question
dtype: 字符串
- name: answers
sequence: 字符串
- name: table
struct:
- name: header
sequence: 字符串
- name: rows
sequence:
sequence: 字符串
- name: name
dtype: 字符串
splits:
- name: train
num_bytes: 28778697
num_examples: 11314
- name: test
num_bytes: 11423506
num_examples: 4344
- name: validation
num_bytes: 8731460
num_examples: 2838
download_size: 29267445
dataset_size: 48933663
- config_name: random-split-4
features:
- name: id
dtype: 字符串
- name: question
dtype: 字符串
- name: answers
sequence: 字符串
- name: table
struct:
- name: header
sequence: 字符串
- name: rows
sequence:
sequence: 字符串
- name: name
dtype: 字符串
splits:
- name: train
num_bytes: 30166421
num_examples: 11321
- name: test
num_bytes: 11423506
num_examples: 4344
- name: validation
num_bytes: 7343736
num_examples: 2831
download_size: 29267445
dataset_size: 48933663
- config_name: random-split-5
features:
- name: id
dtype: 字符串
- name: question
dtype: 字符串
- name: answers
sequence: 字符串
- name: table
struct:
- name: header
sequence: 字符串
- name: rows
sequence:
sequence: 字符串
- name: name
dtype: 字符串
splits:
- name: train
num_bytes: 30333964
num_examples: 11316
- name: test
num_bytes: 11423506
num_examples: 4344
- name: validation
num_bytes: 7176193
num_examples: 2836
download_size: 29267445
dataset_size: 48933663
# WikiTableQuestions数据集卡片
## 目录
- [数据集描述](#数据集描述)
- [数据集概述](#数据集概述)
- [支持的任务与排行榜](#支持的任务与排行榜)
- [语言](#语言)
- [数据集结构](#数据集结构)
- [数据实例](#数据实例)
- [数据字段](#数据字段)
- [数据划分](#数据划分)
- [数据集创建](#数据集创建)
- [构建理由](#构建理由)
- [源数据](#源数据)
- [标注](#标注)
- [个人与敏感信息](#个人与敏感信息)
- [使用数据的注意事项](#使用数据的注意事项)
- [数据集的社会影响](#数据集的社会影响)
- [偏差讨论](#偏差讨论)
- [其他已知限制](#其他已知限制)
- [附加信息](#附加信息)
- [数据集Curator](#数据集curator)
- [许可信息](#许可信息)
- [引用信息](#引用信息)
- [贡献](#贡献)
## 数据集描述
- **主页**: [WikiTableQuestions主页](https://nlp.stanford.edu/software/sempre/wikitable)
- **代码仓库**: [WikiTableQuestions仓库](https://github.com/ppasupat/WikiTableQuestions)
- **论文**: [半结构化表格上的组合语义解析](https://arxiv.org/abs/1508.00305)
- **排行榜**: [PaperWithCode上的WikiTableQuestions排行榜](https://paperswithcode.com/dataset/wikitablequestions)
- **联系方式**: [需补充信息]
### 数据集概述
WikiTableQuestions数据集是一个面向半结构化表格问答任务的大规模数据集。
### 支持的任务与排行榜
问答、表格问答
### 语言
英语
## 数据集结构
### 数据实例
#### default
- **下载的数据集文件大小**: 29.27 MB
- **生成的数据集大小**: 47.90 MB
- **总磁盘使用量**: 77.18 MB
'validation'集的一个示例如下:
{
"id": "nt-0",
"question": "what was the last year where this team was a part of the usl a-league?",
"answers": ["2004"],
"table": {
"header": ["Year", "Division", "League", ...],
"name": "csv/204-csv/590.csv",
"rows": [
["2001", "2", "USL A-League", ...],
["2002", "2", "USL A-League", ...],
...
]
}
}
### 数据字段
所有划分的数据字段一致。
#### default
- `id`: 字符串特征
- `question`: 字符串特征
- `answers`: 字符串列表特征
- `table`: 字典特征,包含:
- `header`: 字符串列表特征
- `rows`: 字符串列表的列表特征
- `name`: 字符串特征
### 数据划分
| 名称 | 训练集 | 验证集 | 测试集 |
|------|--------|--------|--------|
| default | 11321 | 2831 | 4344 |
## 数据集创建
### 构建理由
[需补充信息]
### 源数据
#### 初始数据采集与标准化
[需补充信息]
#### 源语言生产者是谁?
[需补充信息]
### 标注
#### 标注过程
[需补充信息]
#### 标注者是谁?
[需补充信息]
### 个人与敏感信息
[需补充信息]
## 使用数据的注意事项
### 数据集的社会影响
[需补充信息]
### 偏差讨论
[需补充信息]
### 其他已知限制
[需补充信息]
## 附加信息
### 数据集Curator
Panupong Pasupat 和 Percy Liang
### 许可信息
Creative Commons Attribution Share Alike 4.0 International
### 引用信息
@inproceedings{pasupat-liang-2015-compositional,
title = "Compositional Semantic Parsing on Semi-Structured Tables",
author = "Pasupat, Panupong and Liang, Percy",
booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = jul,
year = "2015",
address = "Beijing, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P15-1142",
doi = "10.3115/v1/P15-1142",
pages = "1470--1480",
}
### 贡献
感谢[@SivilTaram](https://github.com/SivilTaram)添加此数据集。
提供机构:
stanfordnlp



