TasnimKabir12/QB2NQ
收藏Hugging Face2026-03-08 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/TasnimKabir12/QB2NQ
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
task_categories:
- question-answering
task_ids:
- open-domain-qa
pretty_name: QB2NQ
---
# QB2NQ: Quiz Bowl to Natural Questions
## Dataset Description
QB2NQ is a dataset of natural questions generated from Quiz Bowl (QB) questions
by applying heuristic transformations to convert quiz bowl clues into
natural language questions.
## Dataset Structure
Each example contains:
- `id`: Unique identifier
- `qanta_id`: Original QB question ID
- `question`: Transformed natural language question
- `answer`: Correct answer
- `answer_type`: Canonical answer type (e.g., "character", "composer")
- `context`: Original QB sentence before transformation
- `transformations`: List of heuristics applied (SplitConjunction, ImperativeToInterrogative, NoWhWords)
## Splits
| Split | Examples |
|------------|----------|
| Train | ~80% |
| Validation | ~10% |
| Test | ~10% |
## Heuristics Applied
1. **SplitConjunction**: Splits sentences joined by coordinating conjunctions
2. **ImperativeToInterrogative**: Converts FTP/points patterns to questions
3. **NoWhWords**: Reformulates statements without wh-words into questions
## Example
```json
{
"question": "Which character is native of Rokovoko and savage companion of Ishmael in Moby - Dick?",
"answer": "Queequeg",
"answer_type": "character",
"context": "For 10 points, name this native of Rokovoko and savage companion of Ishmael in Moby-Dick.",
"transformations": ["ImperativeToInterrogative"]
}
```
## Citation
```
@inproceedings{kabir-etal-2024-make,
title = "You Make me Feel like a Natural Question: Training {QA} Systems on Transformed Trivia Questions",
author = "Kabir, Tasnim and
Sung, Yoo Yeon and
Bandyopadhyay, Saptarashmi and
Zou, Hao and
Chandra, Abhranil and
Boyd-Graber, Jordan Lee",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.1140/",
doi = "10.18653/v1/2024.emnlp-main.1140",
pages = "20486--20510",
abstract = "Training question-answering QA and information retrieval systems for web queries require large, expensive datasets that are difficult to annotate and time-consuming to gather. Moreover, while natural datasets of information-seeking questions are often prone to ambiguity or ill-formed, there are troves of freely available, carefully crafted question datasets for many languages. Thus, we automatically generate shorter, information-seeking questions, resembling web queries in the style of the Natural Questions (NQ) dataset from longer trivia data. Training a QA system on these transformed questions is a viable strategy for alternating to more expensive training setups showing the F1 score difference of less than six points and contrasting the final systems."
}
```
提供机构:
TasnimKabir12



