cfq
收藏魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/google-research-datasets/cfq
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for "cfq"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [https://github.com/google-research/google-research/tree/master/cfq](https://github.com/google-research/google-research/tree/master/cfq)
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Paper:** https://arxiv.org/abs/1912.09713
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Size of downloaded dataset files:** 2.14 GB
- **Size of the generated dataset:** 362.07 MB
- **Total amount of disk used:** 2.50 GB
### Dataset Summary
The Compositional Freebase Questions (CFQ) is a dataset that is specifically designed to measure compositional
generalization. CFQ is a simple yet realistic, large dataset of natural language questions and answers that also
provides for each question a corresponding SPARQL query against the Freebase knowledge base. This means that CFQ can
also be used for semantic parsing.
### Supported Tasks and Leaderboards
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Languages
English (`en`).
## Dataset Structure
### Data Instances
#### mcd1
- **Size of downloaded dataset files:** 267.60 MB
- **Size of the generated dataset:** 42.90 MB
- **Total amount of disk used:** 310.49 MB
An example of 'train' looks as follows.
```
{
'query': 'SELECT count(*) WHERE {\n?x0 a ns:people.person .\n?x0 ns:influence.influence_node.influenced M1 .\n?x0 ns:influence.influence_node.influenced M2 .\n?x0 ns:people.person.spouse_s/ns:people.marriage.spouse|ns:fictional_universe.fictional_character.married_to/ns:fictional_universe.marriage_of_fictional_characters.spouses ?x1 .\n?x1 a ns:film.cinematographer .\nFILTER ( ?x0 != ?x1 )\n}',
'question': 'Did a person marry a cinematographer , influence M1 , and influence M2'
}
```
#### mcd2
- **Size of downloaded dataset files:** 267.60 MB
- **Size of the generated dataset:** 44.77 MB
- **Total amount of disk used:** 312.38 MB
An example of 'train' looks as follows.
```
{
'query': 'SELECT count(*) WHERE {\n?x0 ns:people.person.parents|ns:fictional_universe.fictional_character.parents|ns:organization.organization.parent/ns:organization.organization_relationship.parent ?x1 .\n?x1 a ns:people.person .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person ?x0 .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person M2 .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person M3 .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person M4 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person ?x0 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person M2 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person M3 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person M4\n}',
'question': "Did M1 and M5 employ M2 , M3 , and M4 and employ a person 's child"
}
```
#### mcd3
- **Size of downloaded dataset files:** 267.60 MB
- **Size of the generated dataset:** 43.60 MB
- **Total amount of disk used:** 311.20 MB
An example of 'train' looks as follows.
```
{
"query": "SELECT /producer M0 . /director M0 . ",
"question": "Who produced and directed M0?"
}
```
#### query_complexity_split
- **Size of downloaded dataset files:** 267.60 MB
- **Size of the generated dataset:** 45.95 MB
- **Total amount of disk used:** 313.55 MB
An example of 'train' looks as follows.
```
{
"query": "SELECT /producer M0 . /director M0 . ",
"question": "Who produced and directed M0?"
}
```
#### query_pattern_split
- **Size of downloaded dataset files:** 267.60 MB
- **Size of the generated dataset:** 46.12 MB
- **Total amount of disk used:** 313.72 MB
An example of 'train' looks as follows.
```
{
"query": "SELECT /producer M0 . /director M0 . ",
"question": "Who produced and directed M0?"
}
```
### Data Fields
The data fields are the same among all splits and configurations:
- `question`: a `string` feature.
- `query`: a `string` feature.
### Data Splits
| name | train | test |
|---------------------------|-------:|------:|
| mcd1 | 95743 | 11968 |
| mcd2 | 95743 | 11968 |
| mcd3 | 95743 | 11968 |
| query_complexity_split | 100654 | 9512 |
| query_pattern_split | 94600 | 12589 |
| question_complexity_split | 98999 | 10340 |
| question_pattern_split | 95654 | 11909 |
| random_split | 95744 | 11967 |
## Dataset Creation
### Curation Rationale
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the source language producers?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Annotations
#### Annotation process
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the annotators?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Personal and Sensitive Information
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Discussion of Biases
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Other Known Limitations
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Additional Information
### Dataset Curators
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Licensing Information
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Citation Information
```
@inproceedings{Keysers2020,
title={Measuring Compositional Generalization: A Comprehensive Method on
Realistic Data},
author={Daniel Keysers and Nathanael Sch"{a}rli and Nathan Scales and
Hylke Buisman and Daniel Furrer and Sergii Kashubin and
Nikola Momchev and Danila Sinopalnikov and Lukasz Stafiniak and
Tibor Tihon and Dmitry Tsarkov and Xiao Wang and Marc van Zee and
Olivier Bousquet},
booktitle={ICLR},
year={2020},
url={https://arxiv.org/abs/1912.09713.pdf},
}
```
### Contributions
Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun), [@brainshawn](https://github.com/brainshawn) for adding this dataset.
# "cfq"数据集卡片
## 目录
- [数据集描述](#dataset-description)
- [数据集概述](#dataset-summary)
- [支持任务与基准测试榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据拆分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [标注信息](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献致谢](#contributions)
## 数据集描述
- **主页**:[https://github.com/google-research/google-research/tree/master/cfq](https://github.com/google-research/google-research/tree/master/cfq)
- **仓库**:[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **论文**:https://arxiv.org/abs/1912.09713
- **联络人**:[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **下载数据集文件大小**:2.14 GB
- **生成后数据集大小**:362.07 MB
- **总磁盘占用量**:2.50 GB
### 数据集概述
组合式Freebase问答集(Compositional Freebase Questions,CFQ)是专为评估组合泛化能力而打造的数据集。它是一款简洁却贴合实际的大规模自然语言问答数据集,同时为每个问题提供了对应Freebase知识库的SPARQL查询语句,因此也可应用于语义解析任务。
### 支持任务与基准测试榜
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 语言
英语(`en`)。
## 数据集结构
### 数据实例
#### mcd1
- **下载数据集文件大小**:267.60 MB
- **生成后数据集大小**:42.90 MB
- **总磁盘占用量**:310.49 MB
"训练集"的一条示例如下所示。
{
'query': 'SELECT count(*) WHERE {
?x0 a ns:people.person .
?x0 ns:influence.influence_node.influenced M1 .
?x0 ns:influence.influence_node.influenced M2 .
?x0 ns:people.person.spouse_s/ns:people.marriage.spouse|ns:fictional_universe.fictional_character.married_to/ns:fictional_universe.marriage_of_fictional_characters.spouses ?x1 .
?x1 a ns:film.cinematographer .
FILTER ( ?x0 != ?x1 )
}',
'question': 'Did a person marry a cinematographer , influence M1 , and influence M2'
}
#### mcd2
- **下载数据集文件大小**:267.60 MB
- **生成后数据集大小**:44.77 MB
- **总磁盘占用量**:312.38 MB
"训练集"的一条示例如下所示。
{
'query': 'SELECT count(*) WHERE {
?x0 ns:people.person.parents|ns:fictional_universe.fictional_character.parents|ns:organization.organization.parent/ns:organization.organization_relationship.parent ?x1 .
?x1 a ns:people.person .
M1 ns:business.employer.employees/ns:business.employment_tenure.person ?x0 .
M1 ns:business.employer.employees/ns:business.employment_tenure.person M2 .
M1 ns:business.employer.employees/ns:business.employment_tenure.person M3 .
M1 ns:business.employer.employees/ns:business.employment_tenure.person M4 .
M5 ns:business.employer.employees/ns:business.employment_tenure.person ?x0 .
M5 ns:business.employer.employees/ns:business.employment_tenure.person M2 .
M5 ns:business.employer.employees/ns:business.employment_tenure.person M3 .
M5 ns:business.employer.employees/ns:business.employment_tenure.person M4
}',
'question': "Did M1 and M5 employ M2 , M3 , and M4 and employ a person 's child"
}
#### mcd3
- **下载数据集文件大小**:267.60 MB
- **生成后数据集大小**:43.60 MB
- **总磁盘占用量**:311.20 MB
"训练集"的一条示例如下所示。
{
"query": "SELECT /producer M0 . /director M0 . ",
"question": "Who produced and directed M0?"
}
#### query_complexity_split
- **下载数据集文件大小**:267.60 MB
- **生成后数据集大小**:45.95 MB
- **总磁盘占用量**:313.55 MB
"训练集"的一条示例如下所示。
{
"query": "SELECT /producer M0 . /director M0 . ",
"question": "Who produced and directed M0?"
}
#### query_pattern_split
- **下载数据集文件大小**:267.60 MB
- **生成后数据集大小**:46.12 MB
- **总磁盘占用量**:313.72 MB
"训练集"的一条示例如下所示。
{
"query": "SELECT /producer M0 . /director M0 . ",
"question": "Who produced and directed M0?"
}
### 数据字段
所有拆分与配置下的数据字段均保持一致:
- `question`:字符串类型特征。
- `query`:字符串类型特征。
### 数据拆分
| 拆分名称 | 训练集样本数 | 测试集样本数 |
|---------------------------|-------:|------:|
| mcd1 | 95743 | 11968 |
| mcd2 | 95743 | 11968 |
| mcd3 | 95743 | 11968 |
| query_complexity_split | 100654 | 9512 |
| query_pattern_split | 94600 | 12589 |
| question_complexity_split | 98999 | 10340 |
| question_pattern_split | 95654 | 11909 |
| random_split | 95744 | 11967 |
## 数据集构建
### 构建初衷
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 源数据
#### 初始数据收集与标准化
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### 源语言生成者是谁?
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 标注信息
#### 标注流程
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### 标注者是谁?
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 个人与敏感信息
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 数据集使用注意事项
### 数据集的社会影响
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 偏差讨论
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 其他已知局限性
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 附加信息
### 数据集维护者
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 许可信息
[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 引用信息
@inproceedings{Keysers2020,
title={Measuring Compositional Generalization: A Comprehensive Method on Realistic Data},
author={Daniel Keysers and Nathanael Sch"{a}rli and Nathan Scales and Hylke Buisman and Daniel Furrer and Sergii Kashubin and Nikola Momchev and Danila Sinopalnikov and Lukasz Stafiniak and Tibor Tihon and Dmitry Tsarkov and Xiao Wang and Marc van Zee and Olivier Bousquet},
booktitle={ICLR},
year={2020},
url={https://arxiv.org/abs/1912.09713.pdf},
}
### 贡献致谢
感谢[@thomwolf](https://github.com/thomwolf)、[@patrickvonplaten](https://github.com/patrickvonplaten)、[@lewtun](https://github.com/lewtun)、[@brainshawn](https://github.com/brainshawn) 为本数据集的添加工作。
提供机构:
maas
创建时间:
2025-07-07



