stanfordnlp/wikitablequestions

Name: stanfordnlp/wikitablequestions
Creator: stanfordnlp
Published: 2024-01-18 11:19:00
License: 暂无描述

Hugging Face2024-01-18 更新2025-05-31 收录

下载链接：

https://hf-mirror.com/datasets/stanfordnlp/wikitablequestions

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - crowdsourced language_creators: - found language: - en license: - cc-by-4.0 multilinguality: - monolingual paperswithcode_id: null pretty_name: WikiTableQuestions size_categories: - 10K<n<100K source_datasets: - original task_categories: - question-answering task_ids: [] tags: - table-question-answering dataset_info: - config_name: random-split-1 features: - name: id dtype: string - name: question dtype: string - name: answers sequence: string - name: table struct: - name: header sequence: string - name: rows sequence: sequence: string - name: name dtype: string splits: - name: train num_bytes: 30364389 num_examples: 11321 - name: test num_bytes: 11423506 num_examples: 4344 - name: validation num_bytes: 7145768 num_examples: 2831 download_size: 29267445 dataset_size: 48933663 - config_name: random-split-2 features: - name: id dtype: string - name: question dtype: string - name: answers sequence: string - name: table struct: - name: header sequence: string - name: rows sequence: sequence: string - name: name dtype: string splits: - name: train num_bytes: 30098954 num_examples: 11314 - name: test num_bytes: 11423506 num_examples: 4344 - name: validation num_bytes: 7411203 num_examples: 2838 download_size: 29267445 dataset_size: 48933663 - config_name: random-split-3 features: - name: id dtype: string - name: question dtype: string - name: answers sequence: string - name: table struct: - name: header sequence: string - name: rows sequence: sequence: string - name: name dtype: string splits: - name: train num_bytes: 28778697 num_examples: 11314 - name: test num_bytes: 11423506 num_examples: 4344 - name: validation num_bytes: 8731460 num_examples: 2838 download_size: 29267445 dataset_size: 48933663 - config_name: random-split-4 features: - name: id dtype: string - name: question dtype: string - name: answers sequence: string - name: table struct: - name: header sequence: string - name: rows sequence: sequence: string - name: name dtype: string splits: - name: train num_bytes: 30166421 num_examples: 11321 - name: test num_bytes: 11423506 num_examples: 4344 - name: validation num_bytes: 7343736 num_examples: 2831 download_size: 29267445 dataset_size: 48933663 - config_name: random-split-5 features: - name: id dtype: string - name: question dtype: string - name: answers sequence: string - name: table struct: - name: header sequence: string - name: rows sequence: sequence: string - name: name dtype: string splits: - name: train num_bytes: 30333964 num_examples: 11316 - name: test num_bytes: 11423506 num_examples: 4344 - name: validation num_bytes: 7176193 num_examples: 2836 download_size: 29267445 dataset_size: 48933663 --- # Dataset Card for WikiTableQuestions ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-instances) - [Data Splits](#data-instances) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Homepage:** [WikiTableQuestions homepage](https://nlp.stanford.edu/software/sempre/wikitable) - **Repository:** [WikiTableQuestions repository](https://github.com/ppasupat/WikiTableQuestions) - **Paper:** [Compositional Semantic Parsing on Semi-Structured Tables](https://arxiv.org/abs/1508.00305) - **Leaderboard:** [WikiTableQuestions leaderboard on PaperWithCode](https://paperswithcode.com/dataset/wikitablequestions) - **Point of Contact:** [Needs More Information] ### Dataset Summary The WikiTableQuestions dataset is a large-scale dataset for the task of question answering on semi-structured tables. ### Supported Tasks and Leaderboards question-answering, table-question-answering ### Languages en ## Dataset Structure ### Data Instances #### default - **Size of downloaded dataset files:** 29.27 MB - **Size of the generated dataset:** 47.90 MB - **Total amount of disk used:** 77.18 MB An example of 'validation' looks as follows: ``` { "id": "nt-0", "question": "what was the last year where this team was a part of the usl a-league?", "answers": ["2004"], "table": { "header": ["Year", "Division", "League", ...], "name": "csv/204-csv/590.csv", "rows": [ ["2001", "2", "USL A-League", ...], ["2002", "2", "USL A-League", ...], ... ] } } ``` ### Data Fields The data fields are the same among all splits. #### default - `id`: a `string` feature. - `question`: a `string` feature. - `answers`: a `list` of `string` feature. - `table`: a dictionary feature containing: - `header`: a `list` of `string` features. - `rows`: a `list` of `list` of `string` features: - `name`: a `string` feature. ### Data Splits | name |train|validation|test | |-------|----:|---------:|----:| |default|11321| 2831|4344| ## Dataset Creation ### Curation Rationale [Needs More Information] ### Source Data #### Initial Data Collection and Normalization [Needs More Information] #### Who are the source language producers? [Needs More Information] ### Annotations #### Annotation process [Needs More Information] #### Who are the annotators? [Needs More Information] ### Personal and Sensitive Information [Needs More Information] ## Considerations for Using the Data ### Social Impact of Dataset [Needs More Information] ### Discussion of Biases [Needs More Information] ### Other Known Limitations [Needs More Information] ## Additional Information ### Dataset Curators Panupong Pasupat and Percy Liang ### Licensing Information Creative Commons Attribution Share Alike 4.0 International ### Citation Information ``` @inproceedings{pasupat-liang-2015-compositional, title = "Compositional Semantic Parsing on Semi-Structured Tables", author = "Pasupat, Panupong and Liang, Percy", booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = jul, year = "2015", address = "Beijing, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P15-1142", doi = "10.3115/v1/P15-1142", pages = "1470--1480", } ``` ### Contributions Thanks to [@SivilTaram](https://github.com/SivilTaram) for adding this dataset.

annotations_creators: - 众包 language_creators: - 发现式 language: - 英语 license: - CC-BY-4.0 multilinguality: - 单语种 paperswithcode_id: 无 pretty_name: WikiTableQuestions size_categories: - 10K<n<100K source_datasets: - 原创 task_categories: - 问答 task_ids: [] tags: - 表格问答 dataset_info: - config_name: random-split-1 features: - name: id dtype: 字符串 - name: question dtype: 字符串 - name: answers sequence: 字符串 - name: table struct: - name: header sequence: 字符串 - name: rows sequence: sequence: 字符串 - name: name dtype: 字符串 splits: - name: train num_bytes: 30364389 num_examples: 11321 - name: test num_bytes: 11423506 num_examples: 4344 - name: validation num_bytes: 7145768 num_examples: 2831 download_size: 29267445 dataset_size: 48933663 - config_name: random-split-2 features: - name: id dtype: 字符串 - name: question dtype: 字符串 - name: answers sequence: 字符串 - name: table struct: - name: header sequence: 字符串 - name: rows sequence: sequence: 字符串 - name: name dtype: 字符串 splits: - name: train num_bytes: 30098954 num_examples: 11314 - name: test num_bytes: 11423506 num_examples: 4344 - name: validation num_bytes: 7411203 num_examples: 2838 download_size: 29267445 dataset_size: 48933663 - config_name: random-split-3 features: - name: id dtype: 字符串 - name: question dtype: 字符串 - name: answers sequence: 字符串 - name: table struct: - name: header sequence: 字符串 - name: rows sequence: sequence: 字符串 - name: name dtype: 字符串 splits: - name: train num_bytes: 28778697 num_examples: 11314 - name: test num_bytes: 11423506 num_examples: 4344 - name: validation num_bytes: 8731460 num_examples: 2838 download_size: 29267445 dataset_size: 48933663 - config_name: random-split-4 features: - name: id dtype: 字符串 - name: question dtype: 字符串 - name: answers sequence: 字符串 - name: table struct: - name: header sequence: 字符串 - name: rows sequence: sequence: 字符串 - name: name dtype: 字符串 splits: - name: train num_bytes: 30166421 num_examples: 11321 - name: test num_bytes: 11423506 num_examples: 4344 - name: validation num_bytes: 7343736 num_examples: 2831 download_size: 29267445 dataset_size: 48933663 - config_name: random-split-5 features: - name: id dtype: 字符串 - name: question dtype: 字符串 - name: answers sequence: 字符串 - name: table struct: - name: header sequence: 字符串 - name: rows sequence: sequence: 字符串 - name: name dtype: 字符串 splits: - name: train num_bytes: 30333964 num_examples: 11316 - name: test num_bytes: 11423506 num_examples: 4344 - name: validation num_bytes: 7176193 num_examples: 2836 download_size: 29267445 dataset_size: 48933663 # WikiTableQuestions数据集卡片 ## 目录 - [数据集描述](#数据集描述) - [数据集概述](#数据集概述) - [支持的任务与排行榜](#支持的任务与排行榜) - [语言](#语言) - [数据集结构](#数据集结构) - [数据实例](#数据实例) - [数据字段](#数据字段) - [数据划分](#数据划分) - [数据集创建](#数据集创建) - [构建理由](#构建理由) - [源数据](#源数据) - [标注](#标注) - [个人与敏感信息](#个人与敏感信息) - [使用数据的注意事项](#使用数据的注意事项) - [数据集的社会影响](#数据集的社会影响) - [偏差讨论](#偏差讨论) - [其他已知限制](#其他已知限制) - [附加信息](#附加信息) - [数据集Curator](#数据集curator) - [许可信息](#许可信息) - [引用信息](#引用信息) - [贡献](#贡献) ## 数据集描述 - **主页**: [WikiTableQuestions主页](https://nlp.stanford.edu/software/sempre/wikitable) - **代码仓库**: [WikiTableQuestions仓库](https://github.com/ppasupat/WikiTableQuestions) - **论文**: [半结构化表格上的组合语义解析](https://arxiv.org/abs/1508.00305) - **排行榜**: [PaperWithCode上的WikiTableQuestions排行榜](https://paperswithcode.com/dataset/wikitablequestions) - **联系方式**: [需补充信息] ### 数据集概述 WikiTableQuestions数据集是一个面向半结构化表格问答任务的大规模数据集。 ### 支持的任务与排行榜问答、表格问答 ### 语言英语 ## 数据集结构 ### 数据实例 #### default - **下载的数据集文件大小**: 29.27 MB - **生成的数据集大小**: 47.90 MB - **总磁盘使用量**: 77.18 MB 'validation'集的一个示例如下： { "id": "nt-0", "question": "what was the last year where this team was a part of the usl a-league?", "answers": ["2004"], "table": { "header": ["Year", "Division", "League", ...], "name": "csv/204-csv/590.csv", "rows": [ ["2001", "2", "USL A-League", ...], ["2002", "2", "USL A-League", ...], ... ] } } ### 数据字段所有划分的数据字段一致。 #### default - `id`: 字符串特征 - `question`: 字符串特征 - `answers`: 字符串列表特征 - `table`: 字典特征，包含： - `header`: 字符串列表特征 - `rows`: 字符串列表的列表特征 - `name`: 字符串特征 ### 数据划分 | 名称 | 训练集 | 验证集 | 测试集 | |------|--------|--------|--------| | default | 11321 | 2831 | 4344 | ## 数据集创建 ### 构建理由 [需补充信息] ### 源数据 #### 初始数据采集与标准化 [需补充信息] #### 源语言生产者是谁？ [需补充信息] ### 标注 #### 标注过程 [需补充信息] #### 标注者是谁？ [需补充信息] ### 个人与敏感信息 [需补充信息] ## 使用数据的注意事项 ### 数据集的社会影响 [需补充信息] ### 偏差讨论 [需补充信息] ### 其他已知限制 [需补充信息] ## 附加信息 ### 数据集Curator Panupong Pasupat 和 Percy Liang ### 许可信息 Creative Commons Attribution Share Alike 4.0 International ### 引用信息 @inproceedings{pasupat-liang-2015-compositional, title = "Compositional Semantic Parsing on Semi-Structured Tables", author = "Pasupat, Panupong and Liang, Percy", booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = jul, year = "2015", address = "Beijing, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P15-1142", doi = "10.3115/v1/P15-1142", pages = "1470--1480", } ### 贡献感谢[@SivilTaram](https://github.com/SivilTaram)添加此数据集。

提供机构：

stanfordnlp

5,000+

优质数据集

54 个

任务类型

进入经典数据集