five

orgrctera/uda_nq_qa_qa

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/orgrctera/uda_nq_qa_qa
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en pretty_name: UDA NQ (Natural Questions — Question Answering) size_categories: - 1K<n<5K tags: - question-answering - open-domain - wikipedia - unstructured-documents - natural-questions - task:question-answering configs: - config_name: default data_files: - split: default path: data/default-* dataset_info: features: - name: input dtype: string - name: metadata dtype: string - name: answers dtype: string - name: doc_url dtype: string splits: - name: default num_bytes: 4255499 num_examples: 2477 download_size: 1087542 dataset_size: 4255499 --- # UDA NQ — Natural Questions Question Answering (`orgrctera/uda_nq_qa_qa`) ## Dataset description This release is the **Natural Questions (NQ)–aligned Question Answering (QA)** split from the **UDA (Unstructured Document Analysis)** benchmark: **2,477** instances where each row pairs a real **user-style question** with **gold answers** grounded in a **Wikipedia** article, packaged in UDA’s unified format for training and evaluation of **reading comprehension** and **RAG** systems. **Natural Questions** (Kwiatkowski et al., TACL 2019) is a large-scale benchmark built from **real anonymized search queries** matched to Wikipedia pages. Annotators identify **long answers** (typically a supporting passage) and **short answers** (one or more minimal spans or entities) when present, or leave short answers empty when the task is best answered narratively. The dataset is widely used for **open-domain** and **machine reading comprehension** research. **UDA** (Hui et al., NeurIPS 2024) repackages multiple domains—including a **Knowledge Base** track (**NqText**)—so that documents remain in **realistic** forms and **retrieval, chunking, and parsing** matter alongside generation. The NQ-aligned portion in the UDA paper corresponds to **645** documents and **2,477** Q&A pairs; this Hub dataset matches that **2,477** QA scale in a single `default` split (`sub_benchmark`: `nq_qa`). ## The task - **Task type:** **Question Answering (QA)** for **Natural Questions** — given the question in `input`, predict answers consistent with information in the target document (here, supervision is provided as **long** and **short** reference answers tied to a **Wikipedia URL**). Systems may be evaluated with **exact match**, **token F1**, or other NQ-style metrics on short/long spans, depending on your protocol. - **Input (`input`):** A natural-language question (derived from real user queries in the original NQ pipeline). - **Target (`expected_output`):** A JSON **string** with: - **`answers`:** Object containing: - **`long_answer`:** A passage-level (or paragraph-style) reference answer grounded in the page. - **`short_answer`:** A minimal answer string when applicable (may be **empty** when no concise span is defined). - **`doc_url`:** The Wikipedia article URL (with revision) associated with the question, for traceability and retrieval-oriented evaluation. - **Metadata (`metadata`):** `benchmark_name` (`uda_nq_qa`), `benchmark_type` (`uda`), `split` (`default`), `sub_benchmark` (`nq_qa`), and `value` (JSON string with identifiers such as `label_key`, `label_file`, `doc_url`, `q_uid`). **Splits:** Single split `default` with **2,477** examples (Parquet: `data/default-00000-of-00001.parquet`). ## Background ### Natural Questions NQ was introduced to push **question answering** toward **real queries** and **full-document** reading: questions come from search logs; evidence is a **whole Wikipedia page** (not only a short paragraph selected for convenience). The paper reports large training/dev/test splits and analyzes **human agreement** and model upper bounds. NQ remains a standard benchmark for **open-domain QA** and **passage/span extraction** from web-scale text. ### UDA benchmark UDA aggregates **thousands** of real documents and **tens of thousands** of annotated Q&A pairs across domains (finance, academic papers, knowledge bases), with documents provided so that **realistic ingestion** (e.g., PDF/HTML) stresses **retrieval** and **parsing**. The **NqText** slice (Natural Questions–style QA over text) aligns with this dataset’s **2,477** instances. ## References ### Natural Questions (source task) **Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov.** *Natural Questions: A Benchmark for Question Answering Research.* **Transactions of the ACL (TACL)**, 2019. - **Summary:** Introduces a corpus of real user questions with annotations on Wikipedia pages (long and short answers, including unanswerable cases in the full release); supports training and evaluation of models that **read** full articles rather than isolated snippets. - **ACL Anthology:** [https://aclanthology.org/Q19-1026/](https://aclanthology.org/Q19-1026/) - **DOI:** [10.1162/tacl_a_00276](https://doi.org/10.1162/tacl_a_00276) - **Google Research overview:** [https://ai.google.com/research/NaturalQuestions/](https://ai.google.com/research/NaturalQuestions/) - **Dataset repository:** [https://github.com/google-research-datasets/natural-questions](https://github.com/google-research-datasets/natural-questions) ### UDA (benchmark suite containing this NQ slice) **Yulong Hui, Yao Lu, Huanchen Zhang.** *UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis.* **NeurIPS 2024** (Datasets and Benchmarks Track). - **Abstract (summary):** UDA provides a benchmark for **RAG** in **real-world document analysis** with diverse domains and question types, using real documents and expert annotations; the work analyzes how **parsing and retrieval** interact with generation and highlights practical design choices for document AI systems. - **arXiv:** [https://arxiv.org/abs/2406.15187](https://arxiv.org/abs/2406.15187) - **NeurIPS proceedings:** [https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html) - **Code:** [https://github.com/qinchuanhui/UDA-Benchmark](https://github.com/qinchuanhui/UDA-Benchmark) ### Related Hub resources - Aggregated UDA QA reference: [qinchuanhui/UDA-QA](https://huggingface.co/datasets/qinchuanhui/UDA-QA) ## Examples Illustrative rows; `doc_url` is shortened in prose where helpful. ### Example 1 — institutional fact **`input`:** ```text who determines the size of the supreme court ``` **`expected_output` (parsed JSON):** ```json { "answers": { "long_answer": "Article III of the United States Constitution does not specify the number of justices. The Judiciary Act of 1789 called for the appointment of six `` judges ''. Although an 1801 act would have reduced the size of the court to five members upon its next vacancy, an 1802 act promptly negated the 1801 act, legally restoring the court 's size to six members before any such vacancy occurred. As the nation 's boundaries grew, Congress added justices to correspond with the growing number of judicial circuits: seven in 1807, nine in 1837, and ten in 1863.", "short_answer": "Congress added" }, "doc_url": "https://en.wikipedia.org//w/index.php?title=Supreme_Court_of_the_United_States&amp;oldid=815925358" } ``` ### Example 2 — short answer empty (narrative gold) **`input`:** ```text actress who plays victoria newman on young and the restless ``` **`expected_output` (parsed JSON):** ```json { "answers": { "long_answer": "On March 21, 2005, Heinle joined the cast of the CBS soap opera The Young and the Restless, as Victoria Newman, replacing the popular Heather Tom in the role. She won a Daytime Emmy Award for Outstanding Supporting Actress in a Drama Series in 2014 and again in 2015 for the role.", "short_answer": "" }, "doc_url": "https://en.wikipedia.org//w/index.php?title=Amelia_Heinle&amp;oldid=850793939" } ``` **`metadata.value` (example):** ```json { "label_key": "Amelia Heinle", "label_file": "nq_qa", "doc_url": "https://en.wikipedia.org//w/index.php?title=Amelia_Heinle&amp;oldid=850793939", "q_uid": "6002803770874668212" } ``` ## Citation If you use this dataset, please cite **Natural Questions**, **UDA**, and this dataset record as appropriate: ```bibtex @article{kwiatkowski-etal-2019-natural, title = {Natural Questions: A Benchmark for Question Answering Research}, author = {Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and Toutanova, Kristina and Jones, Llion and Kelcey, Matthew and Chang, Ming-Wei and Dai, Andrew M. and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav}, journal = {Transactions of the Association for Computational Linguistics}, volume = {7}, year = {2019}, pages = {453--466} } ``` ```bibtex @inproceedings{hui2024uda, title = {UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis}, author = {Hui, Yulong and Lu, Yao and Zhang, Huanchen}, booktitle = {Advances in Neural Information Processing Systems}, year = {2024} } ``` ## License The **Natural Questions** data is distributed by Google under the **Apache License 2.0** (see the [official NQ repository](https://github.com/google-research-datasets/natural-questions)). Use this dataset in compliance with the original **Natural Questions** license, the **UDA** benchmark terms, and this Hub record.
提供机构:
orgrctera
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作