five

TasnimKabir12/QB2NQ

收藏
Hugging Face2026-03-08 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/TasnimKabir12/QB2NQ
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit task_categories: - question-answering task_ids: - open-domain-qa pretty_name: QB2NQ --- # QB2NQ: Quiz Bowl to Natural Questions ## Dataset Description QB2NQ is a dataset of natural questions generated from Quiz Bowl (QB) questions by applying heuristic transformations to convert quiz bowl clues into natural language questions. ## Dataset Structure Each example contains: - `id`: Unique identifier - `qanta_id`: Original QB question ID - `question`: Transformed natural language question - `answer`: Correct answer - `answer_type`: Canonical answer type (e.g., "character", "composer") - `context`: Original QB sentence before transformation - `transformations`: List of heuristics applied (SplitConjunction, ImperativeToInterrogative, NoWhWords) ## Splits | Split | Examples | |------------|----------| | Train | ~80% | | Validation | ~10% | | Test | ~10% | ## Heuristics Applied 1. **SplitConjunction**: Splits sentences joined by coordinating conjunctions 2. **ImperativeToInterrogative**: Converts FTP/points patterns to questions 3. **NoWhWords**: Reformulates statements without wh-words into questions ## Example ```json { "question": "Which character is native of Rokovoko and savage companion of Ishmael in Moby - Dick?", "answer": "Queequeg", "answer_type": "character", "context": "For 10 points, name this native of Rokovoko and savage companion of Ishmael in Moby-Dick.", "transformations": ["ImperativeToInterrogative"] } ``` ## Citation ``` @inproceedings{kabir-etal-2024-make, title = "You Make me Feel like a Natural Question: Training {QA} Systems on Transformed Trivia Questions", author = "Kabir, Tasnim and Sung, Yoo Yeon and Bandyopadhyay, Saptarashmi and Zou, Hao and Chandra, Abhranil and Boyd-Graber, Jordan Lee", editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-main.1140/", doi = "10.18653/v1/2024.emnlp-main.1140", pages = "20486--20510", abstract = "Training question-answering QA and information retrieval systems for web queries require large, expensive datasets that are difficult to annotate and time-consuming to gather. Moreover, while natural datasets of information-seeking questions are often prone to ambiguity or ill-formed, there are troves of freely available, carefully crafted question datasets for many languages. Thus, we automatically generate shorter, information-seeking questions, resembling web queries in the style of the Natural Questions (NQ) dataset from longer trivia data. Training a QA system on these transformed questions is a viable strategy for alternating to more expensive training setups showing the F1 score difference of less than six points and contrasting the final systems." } ```
提供机构:
TasnimKabir12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作