orgrctera/uda_tat_qa_qa

Name: orgrctera/uda_tat_qa_qa
Creator: orgrctera
Published: 2026-03-21 06:51:50
License: 暂无描述

Hugging Face2026-03-21 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/orgrctera/uda_tat_qa_qa

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit language: - en pretty_name: UDA TAT-QA (Question Answering) size_categories: - 10K<n<100K tags: - finance - question-answering - table-qa - rag - unstructured-documents - numerical-reasoning configs: - config_name: default data_files: - split: default path: data/default-* dataset_info: features: - name: input dtype: string - name: metadata dtype: string - name: answers dtype: string - name: facts dtype: string - name: derivation dtype: string splits: - name: default num_bytes: 7146079 num_examples: 14703 download_size: 2035587 dataset_size: 7146079 --- # UDA TAT-QA — Question Answering (`orgrctera/uda_tat_qa_qa`) ## Overview This release is the **Question Answering (QA)** packaging of the **TAT-QA** slice from the **UDA (Unstructured Document Analysis)** benchmark: **14,703** labeled question–answer instances grounded in real **financial reports** where evidence mixes **tables and narrative text**. **UDA** (Hui et al., NeurIPS 2024) evaluates retrieval-augmented generation and LLM-based document analysis on **original** PDF/HTML documents. Its finance track includes **TatHybrid**—**TAT-QA–aligned** annotations spanning **170** documents and **14,703** Q&A pairs in the configuration described in the UDA paper. **TAT-QA** (Zhu et al., ACL 2021)—*Tabular And Textual dataset for Question Answering*—introduces large-scale QA over **hybrid** contexts: each instance pairs a **semi-structured table** with **multiple paragraphs** from the same report. Questions demand **numerical reasoning** (arithmetic, counting, comparison, sorting, and compositions), **span extraction**, and careful handling of **scales** and **answer types**. The original work reports a large gap between strong neural baselines and **human** performance, highlighting the difficulty of joint table–text reasoning in finance. In this Hub dataset, each row is a **QA supervision instance**: the model’s goal is to **answer the question** using the report (typically after retrieval and/or layout parsing in a full pipeline). Gold **answers**, **supporting facts**, and optional **derivation** strings are provided in structured form for training and evaluation. ## Task: Question Answering for TAT-QA - **Task type:** **Question Answering** on **TAT-QA–style** financial QA: given a natural-language **question** about figures, policies, or relationships in a disclosure, systems should produce the **correct answer** with evidence grounded in **table and/or text**. - **Input:** `input` — the **question** string. - **Supervision:** `expected_output` — JSON with gold **`answers`** (value, `answer_type`, `scale`), **`facts`** (strings pointing to relevant evidence), and optional **`derivation`** (e.g. arithmetic trace). `metadata` ties each row to UDA/TAT-QA identifiers (`sub_benchmark`: `tat_qa`). Evaluation follows TAT-QA conventions (e.g. span matching, numeric checks, reasoning alignment) and can be combined with retrieval or parsing metrics when embedded in a **RAG** or **document AI** stack. ## Background ### TAT-QA (source task) TAT-QA is built from **real-world financial reports**. Hybrid **table + text** contexts require models to **align** numbers across modalities, perform **multi-step** operations, and output answers as **spans**, **numbers**, or **computed** values. The benchmark popularized **TAGOP**-style pipelines (tagging cells/spans then symbolic reasoning) and remains a standard reference for **financial hybrid QA**. ### UDA benchmark (suite) UDA provides **2,965** real-world documents and **29,590** expert-annotated Q&A pairs across domains, stressing **parsing**, **chunking**, and **retrieval** alongside generation. The **TatHybrid** subset corresponds to this dataset’s **14,703** examples on the `default` split. ## Data fields | Column | Type | Description | |--------|------|-------------| | `input` | `string` | Natural-language **question** about the report. | | `expected_output` | `string` | JSON: `answers` (with `answer`, `answer_type`, `scale`), `facts`, `derivation`. | | `metadata` | struct | `benchmark_name` (`uda_tat_qa`), `benchmark_type` (`uda`), `split`, `sub_benchmark` (`tat_qa`), `value` (JSON string with ids such as `label_key`, `label_file`, `q_uid`, `doc_page_uid`). | **Splits:** `default` — **14,703** examples. ## Examples **Example 1 — span answer (benefits)** - **`input`:** `What benefits are provided by the company to qualifying domestic retirees and their eligible dependents?` - **`expected_output`:** ```json { "answers": { "answer": [ "certain postretirement health care and life insurance benefits" ], "answer_type": "span", "scale": "" }, "facts": [ "certain postretirement health care and life insurance benefits" ], "derivation": "" } ``` **Example 2 — arithmetic answer (pension interest cost)** - **`input`:** `What is the change in Interest cost on benefit obligation for pension benefits from December 31, 2018 and 2019?` - **`expected_output`:** ```json { "answers": { "answer": 129, "answer_type": "arithmetic", "scale": "" }, "facts": ["1,673", "1,802"], "derivation": "1,802-1,673" } ``` **`metadata.value` (illustrative):** ```json { "label_key": "overseas-shipholding-group-inc_2019", "label_file": "tat_qa", "q_uid": "bbdcf6da614f34fdb63995661c81613f", "doc_page_uid": "2ef48dc98e756493f097d01acf8101a2" } ``` ## References ### TAT-QA — primary data lineage Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, Tat-Seng Chua. **TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance.** *Proceedings of ACL 2021*, pages 3086–3101. **Abstract (from the publication):** The paper introduces TAT-QA, with **16,552** questions over **2,757** hybrid contexts from financial reports—each context combines a table with related paragraphs. Questions require diverse **numerical reasoning**; answers may be spans or computed values. The authors propose **TAGOP** and report **58.0** F1 versus **90.8** F1 for human experts, motivating continued research on hybrid financial QA. - **ACL Anthology:** [https://aclanthology.org/2021.acl-long.254/](https://aclanthology.org/2021.acl-long.254/) - **arXiv:** [https://arxiv.org/abs/2105.07624](https://arxiv.org/abs/2105.07624) - **Project page:** [https://nextplusplus.github.io/TAT-QA/](https://nextplusplus.github.io/TAT-QA/) - **Reference HF dataset:** [next-tat/TAT-QA](https://huggingface.co/datasets/next-tat/TAT-QA) - **Code:** [NExTplusplus/TAT-QA](https://github.com/NExTplusplus/TAT-QA) ### UDA — benchmark suite (TatHybrid / finance) Yulong Hui, Yao Lu, Huanchen Zhang. **UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis.** *NeurIPS 2024* (Datasets and Benchmarks Track). **Abstract (summary):** UDA aggregates thousands of documents and tens of thousands of expert Q&A pairs across finance, academia, and Wikipedia-style settings, with documents kept in **original** formats to evaluate **end-to-end** RAG and analysis—including **retrieval** and **layout** challenges. - **arXiv:** [https://arxiv.org/abs/2406.15187](https://arxiv.org/abs/2406.15187) - **NeurIPS proceedings:** [https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html) - **Repository:** [https://github.com/qinchuanhui/UDA-Benchmark](https://github.com/qinchuanhui/UDA-Benchmark) - **Related Hub aggregation:** [qinchuanhui/UDA-QA](https://huggingface.co/datasets/qinchuanhui/UDA-QA) ## Citation If you use this dataset, cite **TAT-QA** and **UDA**, and reference this dataset record as appropriate: ```bibtex @inproceedings{zhu-etal-2021-tat, title = {TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance}, author = {Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng}, booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)}, year = {2021}, pages = {3086--3101} } ``` ```bibtex @article{hui2024uda, title = {UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis}, author = {Hui, Yulong and Lu, Yao and Zhang, Huanchen}, journal = {arXiv preprint arXiv:2406.15187}, year = {2024} } ``` ## License Use this dataset in accordance with the **TAT-QA** and **UDA** licenses and terms for the underlying benchmarks. Confirm suitability for your use case (including commercial or redistribution constraints) via the original project pages and licenses.

提供机构：

orgrctera

5,000+

优质数据集

54 个

任务类型

进入经典数据集