five

wikitablequestions

收藏
魔搭社区2025-11-27 更新2025-11-29 收录
下载链接:
https://modelscope.cn/datasets/stanfordnlp/wikitablequestions
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for WikiTableQuestions ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-instances) - [Data Splits](#data-instances) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Homepage:** [WikiTableQuestions homepage](https://nlp.stanford.edu/software/sempre/wikitable) - **Repository:** [WikiTableQuestions repository](https://github.com/ppasupat/WikiTableQuestions) - **Paper:** [Compositional Semantic Parsing on Semi-Structured Tables](https://arxiv.org/abs/1508.00305) - **Leaderboard:** [WikiTableQuestions leaderboard on PaperWithCode](https://paperswithcode.com/dataset/wikitablequestions) - **Point of Contact:** [Needs More Information] ### Dataset Summary The WikiTableQuestions dataset is a large-scale dataset for the task of question answering on semi-structured tables. ### Supported Tasks and Leaderboards question-answering, table-question-answering ### Languages en ## Dataset Structure ### Data Instances #### default - **Size of downloaded dataset files:** 29.27 MB - **Size of the generated dataset:** 47.90 MB - **Total amount of disk used:** 77.18 MB An example of 'validation' looks as follows: ``` { "id": "nt-0", "question": "what was the last year where this team was a part of the usl a-league?", "answers": ["2004"], "table": { "header": ["Year", "Division", "League", ...], "name": "csv/204-csv/590.csv", "rows": [ ["2001", "2", "USL A-League", ...], ["2002", "2", "USL A-League", ...], ... ] } } ``` ### Data Fields The data fields are the same among all splits. #### default - `id`: a `string` feature. - `question`: a `string` feature. - `answers`: a `list` of `string` feature. - `table`: a dictionary feature containing: - `header`: a `list` of `string` features. - `rows`: a `list` of `list` of `string` features: - `name`: a `string` feature. ### Data Splits | name |train|validation|test | |-------|----:|---------:|----:| |default|11321| 2831|4344| ## Dataset Creation ### Curation Rationale [Needs More Information] ### Source Data #### Initial Data Collection and Normalization [Needs More Information] #### Who are the source language producers? [Needs More Information] ### Annotations #### Annotation process [Needs More Information] #### Who are the annotators? [Needs More Information] ### Personal and Sensitive Information [Needs More Information] ## Considerations for Using the Data ### Social Impact of Dataset [Needs More Information] ### Discussion of Biases [Needs More Information] ### Other Known Limitations [Needs More Information] ## Additional Information ### Dataset Curators Panupong Pasupat and Percy Liang ### Licensing Information Creative Commons Attribution Share Alike 4.0 International ### Citation Information ``` @inproceedings{pasupat-liang-2015-compositional, title = "Compositional Semantic Parsing on Semi-Structured Tables", author = "Pasupat, Panupong and Liang, Percy", booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = jul, year = "2015", address = "Beijing, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P15-1142", doi = "10.3115/v1/P15-1142", pages = "1470--1480", } ``` ### Contributions Thanks to [@SivilTaram](https://github.com/SivilTaram) for adding this dataset.

# 数据集卡片:WikiTableQuestions ## 目录 - [数据集描述](#dataset-description) - [数据集概述](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏见讨论](#discussion-of-biases) - [其他已知局限](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) ## 数据集描述 - **主页**:[WikiTableQuestions 主页](https://nlp.stanford.edu/software/sempre/wikitable) - **代码仓库**:[WikiTableQuestions 代码仓库](https://github.com/ppasupat/WikiTableQuestions) - **相关论文**:[半结构化表格上的组合语义解析](https://arxiv.org/abs/1508.00305) - **排行榜**:[PaperWithCode 上的 WikiTableQuestions 排行榜](https://paperswithcode.com/dataset/wikitablequestions) - **联络人**:[待补充] ### 数据集概述 WikiTableQuestions 数据集是一款面向半结构化表格问答任务的大规模数据集。 ### 支持任务与排行榜 问答、表格问答 ### 语言 英语 ## 数据集结构 ### 数据实例 #### 默认配置 - 下载数据集文件大小:29.27 MB - 生成数据集大小:47.90 MB - 总计占用磁盘空间:77.18 MB 以下是`validation`(验证集)的一个实例样例: json { "id": "nt-0", "question": "该球队曾作为USL A-League参赛的最后一年是哪一年?", "answers": ["2004"], "table": { "header": ["年份", "组别", "联赛", ...], "name": "csv/204-csv/590.csv", "rows": [ ["2001", "2", "USL A-League", ...], ["2002", "2", "USL A-League", ...], ... ] } } ### 数据字段 所有数据划分下的数据字段均保持一致。 #### 默认配置 - `id`:字符串类型特征。 - `question`:字符串类型特征,即问题文本。 - `answers`:字符串列表类型特征,即问题的答案集合。 - `table`:字典类型特征,包含以下子字段: - `header`:字符串列表类型特征,即表格表头。 - `rows`:列表类型,元素为字符串列表,即表格各行数据。 - `name`:字符串类型特征,即对应表格文件的名称。 ### 数据划分 | 划分名称 | 训练集 | 验证集 | 测试集 | |---------|-------:|-------:|-------:| | 默认配置 | 11321 | 2831 | 4344 | ## 数据集构建 ### 构建初衷 [待补充] ### 源数据 #### 初始数据收集与标准化 [待补充] #### 源语言生产者是谁? [待补充] ### 标注信息 #### 标注流程 [待补充] #### 标注者是谁? [待补充] ### 个人与敏感信息 [待补充] ## 数据集使用注意事项 ### 数据集的社会影响 [待补充] ### 偏见讨论 [待补充] ### 其他已知局限 [待补充] ## 附加信息 ### 数据集维护者 Panupong Pasupat 与 Percy Liang ### 许可信息 知识共享署名-相同方式共享4.0国际许可协议(Creative Commons Attribution Share Alike 4.0 International) ### 引用信息 @inproceedings{pasupat-liang-2015-compositional, title = "半结构化表格上的组合语义解析", author = "Pasupat, Panupong and Liang, Percy", booktitle = "第53届国际计算语言学协会年会暨第7届自然语言处理国际联合会议论文集(第1卷:长论文)", month = jul, year = "2015", address = "中国北京", publisher = "国际计算语言学协会", url = "https://aclanthology.org/P15-1142", doi = "10.3115/v1/P15-1142", pages = "1470--1480", } ### 贡献 感谢[@SivilTaram](https://github.com/SivilTaram) 为本数据集提供的贡献。
提供机构:
maas
创建时间:
2025-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作