wikitablequestions

Name: wikitablequestions
Creator: maas
Published: 2025-11-27 16:50:50
License: 暂无描述

魔搭社区2025-11-27 更新2025-11-29 收录

下载链接：

https://modelscope.cn/datasets/stanfordnlp/wikitablequestions

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for WikiTableQuestions ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-instances) - [Data Splits](#data-instances) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Homepage:** [WikiTableQuestions homepage](https://nlp.stanford.edu/software/sempre/wikitable) - **Repository:** [WikiTableQuestions repository](https://github.com/ppasupat/WikiTableQuestions) - **Paper:** [Compositional Semantic Parsing on Semi-Structured Tables](https://arxiv.org/abs/1508.00305) - **Leaderboard:** [WikiTableQuestions leaderboard on PaperWithCode](https://paperswithcode.com/dataset/wikitablequestions) - **Point of Contact:** [Needs More Information] ### Dataset Summary The WikiTableQuestions dataset is a large-scale dataset for the task of question answering on semi-structured tables. ### Supported Tasks and Leaderboards question-answering, table-question-answering ### Languages en ## Dataset Structure ### Data Instances #### default - **Size of downloaded dataset files:** 29.27 MB - **Size of the generated dataset:** 47.90 MB - **Total amount of disk used:** 77.18 MB An example of 'validation' looks as follows: ``` { "id": "nt-0", "question": "what was the last year where this team was a part of the usl a-league?", "answers": ["2004"], "table": { "header": ["Year", "Division", "League", ...], "name": "csv/204-csv/590.csv", "rows": [ ["2001", "2", "USL A-League", ...], ["2002", "2", "USL A-League", ...], ... ] } } ``` ### Data Fields The data fields are the same among all splits. #### default - `id`: a `string` feature. - `question`: a `string` feature. - `answers`: a `list` of `string` feature. - `table`: a dictionary feature containing: - `header`: a `list` of `string` features. - `rows`: a `list` of `list` of `string` features: - `name`: a `string` feature. ### Data Splits | name |train|validation|test | |-------|----:|---------:|----:| |default|11321| 2831|4344| ## Dataset Creation ### Curation Rationale [Needs More Information] ### Source Data #### Initial Data Collection and Normalization [Needs More Information] #### Who are the source language producers? [Needs More Information] ### Annotations #### Annotation process [Needs More Information] #### Who are the annotators? [Needs More Information] ### Personal and Sensitive Information [Needs More Information] ## Considerations for Using the Data ### Social Impact of Dataset [Needs More Information] ### Discussion of Biases [Needs More Information] ### Other Known Limitations [Needs More Information] ## Additional Information ### Dataset Curators Panupong Pasupat and Percy Liang ### Licensing Information Creative Commons Attribution Share Alike 4.0 International ### Citation Information ``` @inproceedings{pasupat-liang-2015-compositional, title = "Compositional Semantic Parsing on Semi-Structured Tables", author = "Pasupat, Panupong and Liang, Percy", booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = jul, year = "2015", address = "Beijing, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P15-1142", doi = "10.3115/v1/P15-1142", pages = "1470--1480", } ``` ### Contributions Thanks to [@SivilTaram](https://github.com/SivilTaram) for adding this dataset.

# 数据集卡片：WikiTableQuestions ## 目录 - [数据集描述](#dataset-description) - [数据集概述](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏见讨论](#discussion-of-biases) - [其他已知局限](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) ## 数据集描述 - **主页**：[WikiTableQuestions 主页](https://nlp.stanford.edu/software/sempre/wikitable) - **代码仓库**：[WikiTableQuestions 代码仓库](https://github.com/ppasupat/WikiTableQuestions) - **相关论文**：[半结构化表格上的组合语义解析](https://arxiv.org/abs/1508.00305) - **排行榜**：[PaperWithCode 上的 WikiTableQuestions 排行榜](https://paperswithcode.com/dataset/wikitablequestions) - **联络人**：[待补充] ### 数据集概述 WikiTableQuestions 数据集是一款面向半结构化表格问答任务的大规模数据集。 ### 支持任务与排行榜问答、表格问答 ### 语言英语 ## 数据集结构 ### 数据实例 #### 默认配置 - 下载数据集文件大小：29.27 MB - 生成数据集大小：47.90 MB - 总计占用磁盘空间：77.18 MB 以下是`validation`（验证集）的一个实例样例： json { "id": "nt-0", "question": "该球队曾作为USL A-League参赛的最后一年是哪一年？", "answers": ["2004"], "table": { "header": ["年份", "组别", "联赛", ...], "name": "csv/204-csv/590.csv", "rows": [ ["2001", "2", "USL A-League", ...], ["2002", "2", "USL A-League", ...], ... ] } } ### 数据字段所有数据划分下的数据字段均保持一致。 #### 默认配置 - `id`：字符串类型特征。 - `question`：字符串类型特征，即问题文本。 - `answers`：字符串列表类型特征，即问题的答案集合。 - `table`：字典类型特征，包含以下子字段： - `header`：字符串列表类型特征，即表格表头。 - `rows`：列表类型，元素为字符串列表，即表格各行数据。 - `name`：字符串类型特征，即对应表格文件的名称。 ### 数据划分 | 划分名称 | 训练集 | 验证集 | 测试集 | |---------|-------:|-------:|-------:| | 默认配置 | 11321 | 2831 | 4344 | ## 数据集构建 ### 构建初衷 [待补充] ### 源数据 #### 初始数据收集与标准化 [待补充] #### 源语言生产者是谁？ [待补充] ### 标注信息 #### 标注流程 [待补充] #### 标注者是谁？ [待补充] ### 个人与敏感信息 [待补充] ## 数据集使用注意事项 ### 数据集的社会影响 [待补充] ### 偏见讨论 [待补充] ### 其他已知局限 [待补充] ## 附加信息 ### 数据集维护者 Panupong Pasupat 与 Percy Liang ### 许可信息知识共享署名-相同方式共享4.0国际许可协议（Creative Commons Attribution Share Alike 4.0 International） ### 引用信息 @inproceedings{pasupat-liang-2015-compositional, title = "半结构化表格上的组合语义解析", author = "Pasupat, Panupong and Liang, Percy", booktitle = "第53届国际计算语言学协会年会暨第7届自然语言处理国际联合会议论文集（第1卷：长论文）", month = jul, year = "2015", address = "中国北京", publisher = "国际计算语言学协会", url = "https://aclanthology.org/P15-1142", doi = "10.3115/v1/P15-1142", pages = "1470--1480", } ### 贡献感谢[@SivilTaram](https://github.com/SivilTaram) 为本数据集提供的贡献。

提供机构：

maas

创建时间：

2025-10-03

搜集汇总

数据集介绍