qasc

Name: qasc
Creator: maas
Published: 2025-11-12 16:35:12
License: 暂无描述

魔搭社区2025-11-12 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/allenai/qasc

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for "qasc" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [https://allenai.org/data/qasc](https://allenai.org/data/qasc) - **Repository:** https://github.com/allenai/qasc/ - **Paper:** [QASC: A Dataset for Question Answering via Sentence Composition](https://arxiv.org/abs/1910.11473) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Size of downloaded dataset files:** 1.61 MB - **Size of the generated dataset:** 5.87 MB - **Total amount of disk used:** 7.49 MB ### Dataset Summary QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Dataset Structure ### Data Instances #### default - **Size of downloaded dataset files:** 1.61 MB - **Size of the generated dataset:** 5.87 MB - **Total amount of disk used:** 7.49 MB An example of 'validation' looks as follows. ``` { "answerKey": "F", "choices": { "label": ["A", "B", "C", "D", "E", "F", "G", "H"], "text": ["sand", "occurs over a wide range", "forests", "Global warming", "rapid changes occur", "local weather conditions", "measure of motion", "city life"] }, "combinedfact": "Climate is generally described in terms of local weather conditions", "fact1": "Climate is generally described in terms of temperature and moisture.", "fact2": "Fire behavior is driven by local weather conditions such as winds, temperature and moisture.", "formatted_question": "Climate is generally described in terms of what? (A) sand (B) occurs over a wide range (C) forests (D) Global warming (E) rapid changes occur (F) local weather conditions (G) measure of motion (H) city life", "id": "3NGI5ARFTT4HNGVWXAMLNBMFA0U1PG", "question": "Climate is generally described in terms of what?" } ``` ### Data Fields The data fields are the same among all splits. #### default - `id`: a `string` feature. - `question`: a `string` feature. - `choices`: a dictionary feature containing: - `text`: a `string` feature. - `label`: a `string` feature. - `answerKey`: a `string` feature. - `fact1`: a `string` feature. - `fact2`: a `string` feature. - `combinedfact`: a `string` feature. - `formatted_question`: a `string` feature. ### Data Splits | name |train|validation|test| |-------|----:|---------:|---:| |default| 8134| 926| 920| ## Dataset Creation ### Curation Rationale [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Source Data #### Initial Data Collection and Normalization [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the source language producers? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Annotations #### Annotation process [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the annotators? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Personal and Sensitive Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Discussion of Biases [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Other Known Limitations [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Additional Information ### Dataset Curators [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Licensing Information The dataset is released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license. ### Citation Information ``` @article{allenai:qasc, author = {Tushar Khot and Peter Clark and Michal Guerquin and Peter Jansen and Ashish Sabharwal}, title = {QASC: A Dataset for Question Answering via Sentence Composition}, journal = {arXiv:1910.11473v2}, year = {2020}, } ``` ### Contributions Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun) for adding this dataset.

# "qasc"数据集卡片 ## 目录 - [数据集描述](#dataset-description) - [数据集概述](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [遴选缘由](#curation-rationale) - [源数据](#source-data) - [标注](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献者](#contributions) ## 数据集描述 - **主页**：[https://allenai.org/data/qasc](https://allenai.org/data/qasc) - **代码仓库**：https://github.com/allenai/qasc/ - **相关论文**：[QASC：基于句子组合的问答数据集](https://arxiv.org/abs/1910.11473) - **联络方式**：[需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **下载数据集文件大小**：1.61 MB - **生成后数据集大小**：5.87 MB - **总磁盘占用**：7.49 MB ### 数据集概述 QASC是一款聚焦于句子组合的问答数据集，包含9980道面向中小学科学的八选一多项选择题（训练集8134道、开发集926道、测试集920道），并附带一个包含1700万条句子的语料库。 ### 支持任务与排行榜 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 语言 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 数据集结构 ### 数据实例 #### 默认配置 - **下载数据集文件大小**：1.61 MB - **生成后数据集大小**：5.87 MB - **总磁盘占用**：7.49 MB 开发集的一个示例如下： { "answerKey": "F", "choices": { "label": ["A", "B", "C", "D", "E", "F", "G", "H"], "text": ["沙子", "分布范围广泛", "森林", "全球变暖", "发生快速变化", "局地气象条件", "运动量的量度", "城市生活"] }, "combinedfact": "气候通常用局地气象条件来描述", "fact1": "气候通常用温度和湿度来描述。", "fact2": "火灾行为由局地气象条件驱动，例如风速、温度与湿度。", "formatted_question": "气候通常用什么来描述？（A）沙子（B）分布范围广泛（C）森林（D）全球变暖（E）发生快速变化（F）局地气象条件（G）运动量的量度（H）城市生活", "id": "3NGI5ARFTT4HNGVWXAMLNBMFA0U1PG", "question": "气候通常用什么来描述？" } ### 数据字段所有数据划分下的字段格式均保持一致。 #### 默认配置 - `id`：字符串类型特征。 - `question`：字符串类型特征。 - `choices`：字典类型特征，包含以下子字段： - `text`：字符串类型特征。 - `label`：字符串类型特征。 - `answerKey`：字符串类型特征。 - `fact1`：字符串类型特征。 - `fact2`：字符串类型特征。 - `combinedfact`：字符串类型特征。 - `formatted_question`：字符串类型特征。 ### 数据划分 | 划分类型 | 训练集 | 开发集 | 测试集 | | :------- | -----: | ------: | ------: | | default | 8134 | 926 | 920 | ## 数据集构建 ### 遴选缘由 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 源数据 #### 初始数据收集与标准化 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### 源语言生产者是谁？ [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 标注 #### 标注流程 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### 标注人员是谁？ [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 个人与敏感信息 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 数据集使用注意事项 ### 数据集的社会影响 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 偏差讨论 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 其他已知局限性 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 附加信息 ### 数据集维护者 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 许可信息本数据集采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)许可协议发布。 ### 引用信息 @article{allenai:qasc, author = {Tushar Khot、Peter Clark、Michal Guerquin、Peter Jansen、Ashish Sabharwal}, title = {QASC：基于句子组合的问答数据集}, journal = {arXiv:1910.11473v2}, year = {2020}, } ### 贡献者感谢[@thomwolf](https://github.com/thomwolf)、[@patrickvonplaten](https://github.com/patrickvonplaten)、[@lewtun](https://github.com/lewtun)为本数据集的收录提供支持。

提供机构：

maas

创建时间：

2025-05-27

搜集汇总

数据集介绍