orgrctera/uda_tat_qa
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/orgrctera/uda_tat_qa
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
pretty_name: UDA TAT-QA (Retrieval)
size_categories:
- 10K<n<100K
tags:
- finance
- question-answering
- retrieval
- rag
- unstructured-documents
- table-qa
configs:
- config_name: default
data_files:
- split: default
path: data/default-*
dataset_info:
features:
- name: input
dtype: string
- name: metadata
dtype: string
- name: answers
dtype: string
- name: facts
dtype: string
- name: derivation
dtype: string
splits:
- name: default
num_bytes: 7146079
num_examples: 14703
download_size: 2035587
dataset_size: 7146079
---
# UDA TAT-QA (`orgrctera/uda_tat_qa`)
## Overview
This dataset is the **TAT-QA** slice of the **UDA (Unstructured Document Analysis)** benchmark: **14,703** question–answer instances derived from real financial reports, packaged for **retrieval-oriented** evaluation in RAG pipelines.
**UDA** is a benchmark suite for Retrieval-Augmented Generation (RAG) over messy, real-world documents (PDF/HTML) where evidence mixes narrative text and tables. In the finance track, UDA includes **TatHybrid**—TAT-QA–aligned labeling at the scale of this release (**170** documents, **14,703** Q&A pairs in the UDA paper’s finance configuration).
**TAT-QA** (Zhu et al., ACL 2021) is a large-scale QA benchmark over **hybrid** contexts: each instance combines **semi-structured tables** with **multiple paragraphs** from the same financial report. Questions require diverse **numerical reasoning** (e.g. arithmetic, counting, comparison) and may be answered with spans, numbers, or derived values. UDA adopts TAT-QA–style supervision within its broader document-analysis benchmark (Hui et al., NeurIPS 2024 Datasets & Benchmarks).
In this Hub release, each row is a **retrieval task instance**: systems must **retrieve** the right document regions (tables and text) and **ground** answers in that evidence—consistent with the **UDA** setting, where **parsing, chunking, and retrieval** are first-class concerns alongside generation.
## Task
- **Task type:** **Retrieval** (within a RAG / document-analysis pipeline) for **TAT-QA**-style financial QA over **table + text** hybrid reports.
- **Input:** A natural-language question (`input`) about figures, policies, or relationships disclosed in corporate financial materials.
- **Supervision / reference:** `expected_output` is a JSON string with gold **answers**, supporting **facts**, and optional **derivation** traces (see below). `metadata` records UDA identifiers (`sub_benchmark`: `tat_qa`).
Evaluation typically combines **retrieval quality** (whether the correct passages or cells are retrieved) with **answer correctness** (span match, numeric accuracy, or reasoning alignment), following TAT-QA and UDA protocols.
## Background
### TAT-QA
TAT-QA targets **question answering over hybrid tabular and textual content** from real financial reports. Compared with text-only or table-only QA, it stresses **joint reasoning**: models must align numbers in tables with narrative explanations, perform **multi-step operations**, and handle **scale** and **answer typing** (e.g. span vs. arithmetic). The original paper reports strong human performance relative to early neural baselines, underscoring dataset difficulty.
### UDA benchmark
UDA revisits RAG and LLM-based document analysis across domains using **2,965** real-world documents and **29,590** expert-annotated Q&A pairs, with sources kept in **original** formats to stress **parsing and retrieval** as well as generation. The **TatHybrid** finance subset corresponds to this dataset’s **14,703** examples in the `default` split.
## Data fields
| Column | Type | Description |
|--------|------|-------------|
| `input` | `string` | Question text posed over the report. |
| `expected_output` | `string` | JSON string with fields such as `answers` (`answer`, `answer_type`, `scale`), `facts` (gold evidence strings), and `derivation` (reasoning trace when applicable). |
| `metadata` | struct | `benchmark_name` (`uda_tat_qa`), `benchmark_type` (`uda`), `split`, `sub_benchmark` (`tat_qa`), and `value` (JSON string with identifiers like `label_key`, `label_file`, `q_uid`, `doc_page_uid`). |
**Splits:** Single split `default` with **14,703** examples.
## Examples
The following rows are taken from the dataset (JSON in `expected_output` is shown formatted for readability).
**Example 1 — span answer (benefits)**
- **`input`:** `What benefits are provided by the company to qualifying domestic retirees and their eligible dependents?`
- **`expected_output`:**
```json
{
"answers": {
"answer": ["certain postretirement health care and life insurance benefits"],
"answer_type": "span",
"scale": ""
},
"facts": ["certain postretirement health care and life insurance benefits"],
"derivation": ""
}
```
**Example 2 — arithmetic answer (pension interest cost)**
- **`input`:** `What is the change in Interest cost on benefit obligation for pension benefits from December 31, 2018 and 2019?`
- **`expected_output`:**
```json
{
"answers": {
"answer": 129,
"answer_type": "arithmetic",
"scale": ""
},
"facts": ["1,673", "1,802"],
"derivation": "1,802-1,673"
}
```
**`metadata.value` (structure, example):**
```json
{
"label_key": "overseas-shipholding-group-inc_2019",
"label_file": "tat_qa",
"q_uid": "bbdcf6da614f34fdb63995661c81613f",
"doc_page_uid": "2ef48dc98e756493f097d01acf8101a2"
}
```
## References
### TAT-QA (source task & data lineage)
Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, Tat-Seng Chua. **TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance.** *ACL 2021*, pages 3086–3101.
- **Abstract (short):** Introduces TAT-QA from real financial reports with hybrid table-and-text contexts; requires numerical reasoning (e.g. arithmetic, counting, comparison). Proposes TAGOP and shows a large gap to human performance.
- **ACL Anthology:** [https://aclanthology.org/2021.acl-long.254/](https://aclanthology.org/2021.acl-long.254/)
- **arXiv:** [https://arxiv.org/abs/2105.07624](https://arxiv.org/abs/2105.07624)
- **Project page:** [https://nextplusplus.github.io/TAT-QA/](https://nextplusplus.github.io/TAT-QA/)
- **Original data (reference):** [next-tat/TAT-QA on Hugging Face](https://huggingface.co/datasets/next-tat/TAT-QA)
### UDA benchmark (suite containing this TAT-QA slice)
Yulong Hui, Yao Lu, Huanchen Zhang. **UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis.** *NeurIPS 2024* (Datasets and Benchmarks Track).
- **Abstract (short):** Presents UDA with thousands of real-world documents and tens of thousands of expert-annotated Q&A pairs; evaluates LLM- and RAG-based document analysis and highlights parsing and retrieval design choices.
- **arXiv:** [https://arxiv.org/abs/2406.15187](https://arxiv.org/abs/2406.15187)
- **NeurIPS proceedings:** [https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html)
- **Code & resources:** [https://github.com/qinchuanhui/UDA-Benchmark](https://github.com/qinchuanhui/UDA-Benchmark)
### Related Hub resources
- UDA QA aggregation (reference): [qinchuanhui/UDA-QA](https://huggingface.co/datasets/qinchuanhui/UDA-QA)
## Citation
If you use this dataset, please cite **both** TAT-QA and UDA (and this dataset record as appropriate):
```bibtex
@inproceedings{zhu-etal-2021-tat,
title = {TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance},
author = {Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng},
booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
year = {2021},
pages = {3086--3101}
}
```
```bibtex
@article{hui2024uda,
title = {UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis},
author = {Hui, Yulong and Lu, Yao and Zhang, Huanchen},
journal = {arXiv preprint arXiv:2406.15187},
year = {2024}
}
```
## License
Use this dataset in compliance with the **original TAT-QA** and **UDA** data licenses and terms. The TAT-QA project page and repository document licensing for the underlying benchmark; verify conditions for your use case before redistribution or commercial use.
提供机构:
orgrctera



