wikitablequestions
收藏魔搭社区2025-11-27 更新2025-11-29 收录
下载链接:
https://modelscope.cn/datasets/stanfordnlp/wikitablequestions
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for WikiTableQuestions
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-instances)
- [Data Splits](#data-instances)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
## Dataset Description
- **Homepage:** [WikiTableQuestions homepage](https://nlp.stanford.edu/software/sempre/wikitable)
- **Repository:** [WikiTableQuestions repository](https://github.com/ppasupat/WikiTableQuestions)
- **Paper:** [Compositional Semantic Parsing on Semi-Structured Tables](https://arxiv.org/abs/1508.00305)
- **Leaderboard:** [WikiTableQuestions leaderboard on PaperWithCode](https://paperswithcode.com/dataset/wikitablequestions)
- **Point of Contact:** [Needs More Information]
### Dataset Summary
The WikiTableQuestions dataset is a large-scale dataset for the task of question answering on semi-structured tables.
### Supported Tasks and Leaderboards
question-answering, table-question-answering
### Languages
en
## Dataset Structure
### Data Instances
#### default
- **Size of downloaded dataset files:** 29.27 MB
- **Size of the generated dataset:** 47.90 MB
- **Total amount of disk used:** 77.18 MB
An example of 'validation' looks as follows:
```
{
"id": "nt-0",
"question": "what was the last year where this team was a part of the usl a-league?",
"answers": ["2004"],
"table": {
"header": ["Year", "Division", "League", ...],
"name": "csv/204-csv/590.csv",
"rows": [
["2001", "2", "USL A-League", ...],
["2002", "2", "USL A-League", ...],
...
]
}
}
```
### Data Fields
The data fields are the same among all splits.
#### default
- `id`: a `string` feature.
- `question`: a `string` feature.
- `answers`: a `list` of `string` feature.
- `table`: a dictionary feature containing:
- `header`: a `list` of `string` features.
- `rows`: a `list` of `list` of `string` features:
- `name`: a `string` feature.
### Data Splits
| name |train|validation|test |
|-------|----:|---------:|----:|
|default|11321| 2831|4344|
## Dataset Creation
### Curation Rationale
[Needs More Information]
### Source Data
#### Initial Data Collection and Normalization
[Needs More Information]
#### Who are the source language producers?
[Needs More Information]
### Annotations
#### Annotation process
[Needs More Information]
#### Who are the annotators?
[Needs More Information]
### Personal and Sensitive Information
[Needs More Information]
## Considerations for Using the Data
### Social Impact of Dataset
[Needs More Information]
### Discussion of Biases
[Needs More Information]
### Other Known Limitations
[Needs More Information]
## Additional Information
### Dataset Curators
Panupong Pasupat and Percy Liang
### Licensing Information
Creative Commons Attribution Share Alike 4.0 International
### Citation Information
```
@inproceedings{pasupat-liang-2015-compositional,
title = "Compositional Semantic Parsing on Semi-Structured Tables",
author = "Pasupat, Panupong and Liang, Percy",
booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = jul,
year = "2015",
address = "Beijing, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P15-1142",
doi = "10.3115/v1/P15-1142",
pages = "1470--1480",
}
```
### Contributions
Thanks to [@SivilTaram](https://github.com/SivilTaram) for adding this dataset.
# 数据集卡片:WikiTableQuestions
## 目录
- [数据集描述](#dataset-description)
- [数据集概述](#dataset-summary)
- [支持任务与排行榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [标注信息](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏见讨论](#discussion-of-biases)
- [其他已知局限](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
## 数据集描述
- **主页**:[WikiTableQuestions 主页](https://nlp.stanford.edu/software/sempre/wikitable)
- **代码仓库**:[WikiTableQuestions 代码仓库](https://github.com/ppasupat/WikiTableQuestions)
- **相关论文**:[半结构化表格上的组合语义解析](https://arxiv.org/abs/1508.00305)
- **排行榜**:[PaperWithCode 上的 WikiTableQuestions 排行榜](https://paperswithcode.com/dataset/wikitablequestions)
- **联络人**:[待补充]
### 数据集概述
WikiTableQuestions 数据集是一款面向半结构化表格问答任务的大规模数据集。
### 支持任务与排行榜
问答、表格问答
### 语言
英语
## 数据集结构
### 数据实例
#### 默认配置
- 下载数据集文件大小:29.27 MB
- 生成数据集大小:47.90 MB
- 总计占用磁盘空间:77.18 MB
以下是`validation`(验证集)的一个实例样例:
json
{
"id": "nt-0",
"question": "该球队曾作为USL A-League参赛的最后一年是哪一年?",
"answers": ["2004"],
"table": {
"header": ["年份", "组别", "联赛", ...],
"name": "csv/204-csv/590.csv",
"rows": [
["2001", "2", "USL A-League", ...],
["2002", "2", "USL A-League", ...],
...
]
}
}
### 数据字段
所有数据划分下的数据字段均保持一致。
#### 默认配置
- `id`:字符串类型特征。
- `question`:字符串类型特征,即问题文本。
- `answers`:字符串列表类型特征,即问题的答案集合。
- `table`:字典类型特征,包含以下子字段:
- `header`:字符串列表类型特征,即表格表头。
- `rows`:列表类型,元素为字符串列表,即表格各行数据。
- `name`:字符串类型特征,即对应表格文件的名称。
### 数据划分
| 划分名称 | 训练集 | 验证集 | 测试集 |
|---------|-------:|-------:|-------:|
| 默认配置 | 11321 | 2831 | 4344 |
## 数据集构建
### 构建初衷
[待补充]
### 源数据
#### 初始数据收集与标准化
[待补充]
#### 源语言生产者是谁?
[待补充]
### 标注信息
#### 标注流程
[待补充]
#### 标注者是谁?
[待补充]
### 个人与敏感信息
[待补充]
## 数据集使用注意事项
### 数据集的社会影响
[待补充]
### 偏见讨论
[待补充]
### 其他已知局限
[待补充]
## 附加信息
### 数据集维护者
Panupong Pasupat 与 Percy Liang
### 许可信息
知识共享署名-相同方式共享4.0国际许可协议(Creative Commons Attribution Share Alike 4.0 International)
### 引用信息
@inproceedings{pasupat-liang-2015-compositional,
title = "半结构化表格上的组合语义解析",
author = "Pasupat, Panupong and Liang, Percy",
booktitle = "第53届国际计算语言学协会年会暨第7届自然语言处理国际联合会议论文集(第1卷:长论文)",
month = jul,
year = "2015",
address = "中国北京",
publisher = "国际计算语言学协会",
url = "https://aclanthology.org/P15-1142",
doi = "10.3115/v1/P15-1142",
pages = "1470--1480",
}
### 贡献
感谢[@SivilTaram](https://github.com/SivilTaram) 为本数据集提供的贡献。
提供机构:
maas
创建时间:
2025-10-03



