orgrctera/uda_tat_qa

Name: orgrctera/uda_tat_qa
Creator: orgrctera
Published: 2026-03-21 06:51:57
License: 暂无描述

Hugging Face2026-03-21 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/orgrctera/uda_tat_qa

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit language: - en pretty_name: UDA TAT-QA (Retrieval) size_categories: - 10K<n<100K tags: - finance - question-answering - retrieval - rag - unstructured-documents - table-qa configs: - config_name: default data_files: - split: default path: data/default-* dataset_info: features: - name: input dtype: string - name: metadata dtype: string - name: answers dtype: string - name: facts dtype: string - name: derivation dtype: string splits: - name: default num_bytes: 7146079 num_examples: 14703 download_size: 2035587 dataset_size: 7146079 --- # UDA TAT-QA (`orgrctera/uda_tat_qa`) ## Overview This dataset is the **TAT-QA** slice of the **UDA (Unstructured Document Analysis)** benchmark: **14,703** question–answer instances derived from real financial reports, packaged for **retrieval-oriented** evaluation in RAG pipelines. **UDA** is a benchmark suite for Retrieval-Augmented Generation (RAG) over messy, real-world documents (PDF/HTML) where evidence mixes narrative text and tables. In the finance track, UDA includes **TatHybrid**—TAT-QA–aligned labeling at the scale of this release (**170** documents, **14,703** Q&A pairs in the UDA paper’s finance configuration). **TAT-QA** (Zhu et al., ACL 2021) is a large-scale QA benchmark over **hybrid** contexts: each instance combines **semi-structured tables** with **multiple paragraphs** from the same financial report. Questions require diverse **numerical reasoning** (e.g. arithmetic, counting, comparison) and may be answered with spans, numbers, or derived values. UDA adopts TAT-QA–style supervision within its broader document-analysis benchmark (Hui et al., NeurIPS 2024 Datasets & Benchmarks). In this Hub release, each row is a **retrieval task instance**: systems must **retrieve** the right document regions (tables and text) and **ground** answers in that evidence—consistent with the **UDA** setting, where **parsing, chunking, and retrieval** are first-class concerns alongside generation. ## Task - **Task type:** **Retrieval** (within a RAG / document-analysis pipeline) for **TAT-QA**-style financial QA over **table + text** hybrid reports. - **Input:** A natural-language question (`input`) about figures, policies, or relationships disclosed in corporate financial materials. - **Supervision / reference:** `expected_output` is a JSON string with gold **answers**, supporting **facts**, and optional **derivation** traces (see below). `metadata` records UDA identifiers (`sub_benchmark`: `tat_qa`). Evaluation typically combines **retrieval quality** (whether the correct passages or cells are retrieved) with **answer correctness** (span match, numeric accuracy, or reasoning alignment), following TAT-QA and UDA protocols. ## Background ### TAT-QA TAT-QA targets **question answering over hybrid tabular and textual content** from real financial reports. Compared with text-only or table-only QA, it stresses **joint reasoning**: models must align numbers in tables with narrative explanations, perform **multi-step operations**, and handle **scale** and **answer typing** (e.g. span vs. arithmetic). The original paper reports strong human performance relative to early neural baselines, underscoring dataset difficulty. ### UDA benchmark UDA revisits RAG and LLM-based document analysis across domains using **2,965** real-world documents and **29,590** expert-annotated Q&A pairs, with sources kept in **original** formats to stress **parsing and retrieval** as well as generation. The **TatHybrid** finance subset corresponds to this dataset’s **14,703** examples in the `default` split. ## Data fields | Column | Type | Description | |--------|------|-------------| | `input` | `string` | Question text posed over the report. | | `expected_output` | `string` | JSON string with fields such as `answers` (`answer`, `answer_type`, `scale`), `facts` (gold evidence strings), and `derivation` (reasoning trace when applicable). | | `metadata` | struct | `benchmark_name` (`uda_tat_qa`), `benchmark_type` (`uda`), `split`, `sub_benchmark` (`tat_qa`), and `value` (JSON string with identifiers like `label_key`, `label_file`, `q_uid`, `doc_page_uid`). | **Splits:** Single split `default` with **14,703** examples. ## Examples The following rows are taken from the dataset (JSON in `expected_output` is shown formatted for readability). **Example 1 — span answer (benefits)** - **`input`:** `What benefits are provided by the company to qualifying domestic retirees and their eligible dependents?` - **`expected_output`:** ```json { "answers": { "answer": ["certain postretirement health care and life insurance benefits"], "answer_type": "span", "scale": "" }, "facts": ["certain postretirement health care and life insurance benefits"], "derivation": "" } ``` **Example 2 — arithmetic answer (pension interest cost)** - **`input`:** `What is the change in Interest cost on benefit obligation for pension benefits from December 31, 2018 and 2019?` - **`expected_output`:** ```json { "answers": { "answer": 129, "answer_type": "arithmetic", "scale": "" }, "facts": ["1,673", "1,802"], "derivation": "1,802-1,673" } ``` **`metadata.value` (structure, example):** ```json { "label_key": "overseas-shipholding-group-inc_2019", "label_file": "tat_qa", "q_uid": "bbdcf6da614f34fdb63995661c81613f", "doc_page_uid": "2ef48dc98e756493f097d01acf8101a2" } ``` ## References ### TAT-QA (source task & data lineage) Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, Tat-Seng Chua. **TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance.** *ACL 2021*, pages 3086–3101. - **Abstract (short):** Introduces TAT-QA from real financial reports with hybrid table-and-text contexts; requires numerical reasoning (e.g. arithmetic, counting, comparison). Proposes TAGOP and shows a large gap to human performance. - **ACL Anthology:** [https://aclanthology.org/2021.acl-long.254/](https://aclanthology.org/2021.acl-long.254/) - **arXiv:** [https://arxiv.org/abs/2105.07624](https://arxiv.org/abs/2105.07624) - **Project page:** [https://nextplusplus.github.io/TAT-QA/](https://nextplusplus.github.io/TAT-QA/) - **Original data (reference):** [next-tat/TAT-QA on Hugging Face](https://huggingface.co/datasets/next-tat/TAT-QA) ### UDA benchmark (suite containing this TAT-QA slice) Yulong Hui, Yao Lu, Huanchen Zhang. **UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis.** *NeurIPS 2024* (Datasets and Benchmarks Track). - **Abstract (short):** Presents UDA with thousands of real-world documents and tens of thousands of expert-annotated Q&A pairs; evaluates LLM- and RAG-based document analysis and highlights parsing and retrieval design choices. - **arXiv:** [https://arxiv.org/abs/2406.15187](https://arxiv.org/abs/2406.15187) - **NeurIPS proceedings:** [https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html) - **Code & resources:** [https://github.com/qinchuanhui/UDA-Benchmark](https://github.com/qinchuanhui/UDA-Benchmark) ### Related Hub resources - UDA QA aggregation (reference): [qinchuanhui/UDA-QA](https://huggingface.co/datasets/qinchuanhui/UDA-QA) ## Citation If you use this dataset, please cite **both** TAT-QA and UDA (and this dataset record as appropriate): ```bibtex @inproceedings{zhu-etal-2021-tat, title = {TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance}, author = {Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng}, booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)}, year = {2021}, pages = {3086--3101} } ``` ```bibtex @article{hui2024uda, title = {UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis}, author = {Hui, Yulong and Lu, Yao and Zhang, Huanchen}, journal = {arXiv preprint arXiv:2406.15187}, year = {2024} } ``` ## License Use this dataset in compliance with the **original TAT-QA** and **UDA** data licenses and terms. The TAT-QA project page and repository document licensing for the underlying benchmark; verify conditions for your use case before redistribution or commercial use.

提供机构：

orgrctera

5,000+

优质数据集

54 个

任务类型

进入经典数据集