quartz
收藏魔搭社区2025-11-12 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/quartz
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for "quartz"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [https://allenai.org/data/quartz](https://allenai.org/data/quartz)
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Size of downloaded dataset files:** 0.49 MB
- **Size of the generated dataset:** 1.72 MB
- **Total amount of disk used:** 2.22 MB
### Dataset Summary
QuaRTz is a crowdsourced dataset of 3864 multiple-choice questions about open domain qualitative relationships. Each
question is paired with one of 405 different background sentences (sometimes short paragraphs).
The QuaRTz dataset V1 contains 3864 questions about open domain qualitative relationships. Each question is paired with
one of 405 different background sentences (sometimes short paragraphs).
The dataset is split into train (2696), dev (384) and test (784). A background sentence will only appear in a single split.
### Supported Tasks and Leaderboards
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Languages
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Dataset Structure
### Data Instances
#### default
- **Size of downloaded dataset files:** 0.49 MB
- **Size of the generated dataset:** 1.72 MB
- **Total amount of disk used:** 2.22 MB
An example of 'train' looks as follows.
```
{
"answerKey": "A",
"choices": {
"label": ["A", "B"],
"text": ["higher", "lower"]
},
"id": "QRQA-10116-3",
"para": "Electrons at lower energy levels, which are closer to the nucleus, have less energy.",
"para_anno": {
"cause_dir_sign": "LESS",
"cause_dir_str": "closer",
"cause_prop": "distance from a nucleus",
"effect_dir_sign": "LESS",
"effect_dir_str": "less",
"effect_prop": "energy"
},
"para_id": "QRSent-10116",
"question": "Electrons further away from a nucleus have _____ energy levels than close ones.",
"question_anno": {
"less_cause_dir": "electron energy levels",
"less_cause_prop": "nucleus",
"less_effect_dir": "lower",
"less_effect_prop": "electron energy levels",
"more_effect_dir": "higher",
"more_effect_prop": "electron energy levels"
}
}
```
### Data Fields
The data fields are the same among all splits.
#### default
- `id`: a `string` feature.
- `question`: a `string` feature.
- `choices`: a dictionary feature containing:
- `text`: a `string` feature.
- `label`: a `string` feature.
- `answerKey`: a `string` feature.
- `para`: a `string` feature.
- `para_id`: a `string` feature.
- `effect_prop`: a `string` feature.
- `cause_dir_str`: a `string` feature.
- `effect_dir_str`: a `string` feature.
- `cause_dir_sign`: a `string` feature.
- `effect_dir_sign`: a `string` feature.
- `cause_prop`: a `string` feature.
- `more_effect_dir`: a `string` feature.
- `less_effect_dir`: a `string` feature.
- `less_cause_prop`: a `string` feature.
- `more_effect_prop`: a `string` feature.
- `less_effect_prop`: a `string` feature.
- `less_cause_dir`: a `string` feature.
### Data Splits
| name |train|validation|test|
|-------|----:|---------:|---:|
|default| 2696| 384| 784|
## Dataset Creation
### Curation Rationale
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the source language producers?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Annotations
#### Annotation process
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the annotators?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Personal and Sensitive Information
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Discussion of Biases
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Other Known Limitations
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Additional Information
### Dataset Curators
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Licensing Information
The dataset is licensed under Creative Commons [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
### Citation Information
```
@InProceedings{quartz,
author = {Oyvind Tafjord and Matt Gardner and Kevin Lin and Peter Clark},
title = {"QUARTZ: An Open-Domain Dataset of Qualitative Relationship
Questions"},
year = {"2019"},
}
```
### Contributions
Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun), [@thomwolf](https://github.com/thomwolf) for adding this dataset.
# 「quartz」数据集卡片
## 目录
- [数据集描述](#dataset-description)
- [数据集概述](#dataset-summary)
- [支持任务与排行榜](#supported-tasks-and-leaderboards)
- [语言覆盖](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [标注信息](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏倚讨论](#discussion-of-biases)
- [其他已知局限](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [授权信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献致谢](#contributions)
## 数据集描述
- **主页**:[https://allenai.org/data/quartz](https://allenai.org/data/quartz)
- **代码仓库**:[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **相关论文**:[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **联系人**:[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **下载数据集文件大小**:0.49 MB
- **生成数据集大小**:1.72 MB
- **总磁盘占用**:2.22 MB
### 数据集概述
QuaRTz是一个众包构建的开放域定性关系多项选择题数据集,共包含3864道题目。每道题目对应405种不同的背景语句(部分为简短段落)。
QuaRTz数据集V1包含3864道开放域定性关系多项选择题,每道题目对应405种不同的背景语句(部分为简短段落)。
该数据集划分为训练集(2696条)、验证集(384条)与测试集(784条),且每个背景语句仅会出现在一个数据划分中。
### 支持任务与排行榜
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 语言覆盖
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 数据集结构
### 数据实例
#### 默认配置
- **下载数据集文件大小**:0.49 MB
- **生成数据集大小**:1.72 MB
- **总磁盘占用**:2.22 MB
以下是一条训练集样本示例:
{
"answerKey": "A",
"choices": {
"label": ["A", "B"],
"text": ["higher", "lower"]
},
"id": "QRQA-10116-3",
"para": "Electrons at lower energy levels, which are closer to the nucleus, have less energy.",
"para_anno": {
"cause_dir_sign": "LESS",
"cause_dir_str": "closer",
"cause_prop": "distance from a nucleus",
"effect_dir_sign": "LESS",
"effect_dir_str": "less",
"effect_prop": "energy"
},
"para_id": "QRSent-10116",
"question": "Electrons further away from a nucleus have _____ energy levels than close ones.",
"question_anno": {
"less_cause_dir": "electron energy levels",
"less_cause_prop": "nucleus",
"less_effect_dir": "lower",
"less_effect_prop": "electron energy levels",
"more_effect_dir": "higher",
"more_effect_prop": "electron energy levels"
}
}
### 数据字段
所有数据划分的数据字段均保持一致。
#### 默认配置
- `id`:字符串类型特征
- `question`:字符串类型特征
- `choices`:字典类型特征,包含以下子字段:
- `text`:字符串类型特征
- `label`:字符串类型特征
- `answerKey`:字符串类型特征
- `para`:字符串类型特征
- `para_id`:字符串类型特征
- `effect_prop`:字符串类型特征
- `cause_dir_str`:字符串类型特征
- `effect_dir_str`:字符串类型特征
- `cause_dir_sign`:字符串类型特征
- `effect_dir_sign`:字符串类型特征
- `cause_prop`:字符串类型特征
- `more_effect_dir`:字符串类型特征
- `less_effect_dir`:字符串类型特征
- `less_cause_prop`:字符串类型特征
- `more_effect_prop`:字符串类型特征
- `less_effect_prop`:字符串类型特征
- `less_cause_dir`:字符串类型特征
### 数据划分
| 划分名称 | 训练集 | 验证集 | 测试集 |
|---------|-------:|-------:|------:|
| 默认配置 | 2696 | 384 | 784 |
## 数据集构建
### 构建初衷
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 源数据
#### 初始数据收集与标准化
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### 源语言生成者
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 标注信息
#### 标注流程
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### 标注人员
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 个人与敏感信息
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 数据集使用注意事项
### 数据集的社会影响
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 偏倚讨论
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 其他已知局限
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 附加信息
### 数据集维护者
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 授权信息
本数据集采用知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International,CC BY 4.0)。
### 引用信息
@InProceedings{quartz,
author = {Oyvind Tafjord and Matt Gardner and Kevin Lin and Peter Clark},
title = {"QUARTZ: An Open-Domain Dataset of Qualitative Relationship
Questions"},
year = {"2019"},
}
### 贡献致谢
感谢[@patrickvonplaten](https://github.com/patrickvonplaten)、[@lewtun](https://github.com/lewtun)、[@thomwolf](https://github.com/thomwolf)为本数据集的收录提供的贡献。
提供机构:
maas
创建时间:
2025-05-27



