TasnimKabir12/QB2NQ

Name: TasnimKabir12/QB2NQ
Creator: TasnimKabir12
Published: 2026-03-08 13:40:40
License: 暂无描述

Hugging Face2026-03-08 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/TasnimKabir12/QB2NQ

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: mit task_categories: - question-answering task_ids: - open-domain-qa pretty_name: QB2NQ --- # QB2NQ: Quiz Bowl to Natural Questions ## Dataset Description QB2NQ is a dataset of natural questions generated from Quiz Bowl (QB) questions by applying heuristic transformations to convert quiz bowl clues into natural language questions. ## Dataset Structure Each example contains: - `id`: Unique identifier - `qanta_id`: Original QB question ID - `question`: Transformed natural language question - `answer`: Correct answer - `answer_type`: Canonical answer type (e.g., "character", "composer") - `context`: Original QB sentence before transformation - `transformations`: List of heuristics applied (SplitConjunction, ImperativeToInterrogative, NoWhWords) ## Splits | Split | Examples | |------------|----------| | Train | ~80% | | Validation | ~10% | | Test | ~10% | ## Heuristics Applied 1. **SplitConjunction**: Splits sentences joined by coordinating conjunctions 2. **ImperativeToInterrogative**: Converts FTP/points patterns to questions 3. **NoWhWords**: Reformulates statements without wh-words into questions ## Example ```json { "question": "Which character is native of Rokovoko and savage companion of Ishmael in Moby - Dick?", "answer": "Queequeg", "answer_type": "character", "context": "For 10 points, name this native of Rokovoko and savage companion of Ishmael in Moby-Dick.", "transformations": ["ImperativeToInterrogative"] } ``` ## Citation ``` @inproceedings{kabir-etal-2024-make, title = "You Make me Feel like a Natural Question: Training {QA} Systems on Transformed Trivia Questions", author = "Kabir, Tasnim and Sung, Yoo Yeon and Bandyopadhyay, Saptarashmi and Zou, Hao and Chandra, Abhranil and Boyd-Graber, Jordan Lee", editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-main.1140/", doi = "10.18653/v1/2024.emnlp-main.1140", pages = "20486--20510", abstract = "Training question-answering QA and information retrieval systems for web queries require large, expensive datasets that are difficult to annotate and time-consuming to gather. Moreover, while natural datasets of information-seeking questions are often prone to ambiguity or ill-formed, there are troves of freely available, carefully crafted question datasets for many languages. Thus, we automatically generate shorter, information-seeking questions, resembling web queries in the style of the Natural Questions (NQ) dataset from longer trivia data. Training a QA system on these transformed questions is a viable strategy for alternating to more expensive training setups showing the F1 score difference of less than six points and contrasting the final systems." } ```

提供机构：

TasnimKabir12

5,000+

优质数据集

54 个

任务类型

进入经典数据集