orgrctera/uda_tat_qa_qa
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/orgrctera/uda_tat_qa_qa
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
pretty_name: UDA TAT-QA (Question Answering)
size_categories:
- 10K<n<100K
tags:
- finance
- question-answering
- table-qa
- rag
- unstructured-documents
- numerical-reasoning
configs:
- config_name: default
data_files:
- split: default
path: data/default-*
dataset_info:
features:
- name: input
dtype: string
- name: metadata
dtype: string
- name: answers
dtype: string
- name: facts
dtype: string
- name: derivation
dtype: string
splits:
- name: default
num_bytes: 7146079
num_examples: 14703
download_size: 2035587
dataset_size: 7146079
---
# UDA TAT-QA — Question Answering (`orgrctera/uda_tat_qa_qa`)
## Overview
This release is the **Question Answering (QA)** packaging of the **TAT-QA** slice from the **UDA (Unstructured Document Analysis)** benchmark: **14,703** labeled question–answer instances grounded in real **financial reports** where evidence mixes **tables and narrative text**.
**UDA** (Hui et al., NeurIPS 2024) evaluates retrieval-augmented generation and LLM-based document analysis on **original** PDF/HTML documents. Its finance track includes **TatHybrid**—**TAT-QA–aligned** annotations spanning **170** documents and **14,703** Q&A pairs in the configuration described in the UDA paper.
**TAT-QA** (Zhu et al., ACL 2021)—*Tabular And Textual dataset for Question Answering*—introduces large-scale QA over **hybrid** contexts: each instance pairs a **semi-structured table** with **multiple paragraphs** from the same report. Questions demand **numerical reasoning** (arithmetic, counting, comparison, sorting, and compositions), **span extraction**, and careful handling of **scales** and **answer types**. The original work reports a large gap between strong neural baselines and **human** performance, highlighting the difficulty of joint table–text reasoning in finance.
In this Hub dataset, each row is a **QA supervision instance**: the model’s goal is to **answer the question** using the report (typically after retrieval and/or layout parsing in a full pipeline). Gold **answers**, **supporting facts**, and optional **derivation** strings are provided in structured form for training and evaluation.
## Task: Question Answering for TAT-QA
- **Task type:** **Question Answering** on **TAT-QA–style** financial QA: given a natural-language **question** about figures, policies, or relationships in a disclosure, systems should produce the **correct answer** with evidence grounded in **table and/or text**.
- **Input:** `input` — the **question** string.
- **Supervision:** `expected_output` — JSON with gold **`answers`** (value, `answer_type`, `scale`), **`facts`** (strings pointing to relevant evidence), and optional **`derivation`** (e.g. arithmetic trace). `metadata` ties each row to UDA/TAT-QA identifiers (`sub_benchmark`: `tat_qa`).
Evaluation follows TAT-QA conventions (e.g. span matching, numeric checks, reasoning alignment) and can be combined with retrieval or parsing metrics when embedded in a **RAG** or **document AI** stack.
## Background
### TAT-QA (source task)
TAT-QA is built from **real-world financial reports**. Hybrid **table + text** contexts require models to **align** numbers across modalities, perform **multi-step** operations, and output answers as **spans**, **numbers**, or **computed** values. The benchmark popularized **TAGOP**-style pipelines (tagging cells/spans then symbolic reasoning) and remains a standard reference for **financial hybrid QA**.
### UDA benchmark (suite)
UDA provides **2,965** real-world documents and **29,590** expert-annotated Q&A pairs across domains, stressing **parsing**, **chunking**, and **retrieval** alongside generation. The **TatHybrid** subset corresponds to this dataset’s **14,703** examples on the `default` split.
## Data fields
| Column | Type | Description |
|--------|------|-------------|
| `input` | `string` | Natural-language **question** about the report. |
| `expected_output` | `string` | JSON: `answers` (with `answer`, `answer_type`, `scale`), `facts`, `derivation`. |
| `metadata` | struct | `benchmark_name` (`uda_tat_qa`), `benchmark_type` (`uda`), `split`, `sub_benchmark` (`tat_qa`), `value` (JSON string with ids such as `label_key`, `label_file`, `q_uid`, `doc_page_uid`). |
**Splits:** `default` — **14,703** examples.
## Examples
**Example 1 — span answer (benefits)**
- **`input`:** `What benefits are provided by the company to qualifying domestic retirees and their eligible dependents?`
- **`expected_output`:**
```json
{
"answers": {
"answer": [
"certain postretirement health care and life insurance benefits"
],
"answer_type": "span",
"scale": ""
},
"facts": [
"certain postretirement health care and life insurance benefits"
],
"derivation": ""
}
```
**Example 2 — arithmetic answer (pension interest cost)**
- **`input`:** `What is the change in Interest cost on benefit obligation for pension benefits from December 31, 2018 and 2019?`
- **`expected_output`:**
```json
{
"answers": {
"answer": 129,
"answer_type": "arithmetic",
"scale": ""
},
"facts": ["1,673", "1,802"],
"derivation": "1,802-1,673"
}
```
**`metadata.value` (illustrative):**
```json
{
"label_key": "overseas-shipholding-group-inc_2019",
"label_file": "tat_qa",
"q_uid": "bbdcf6da614f34fdb63995661c81613f",
"doc_page_uid": "2ef48dc98e756493f097d01acf8101a2"
}
```
## References
### TAT-QA — primary data lineage
Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, Tat-Seng Chua. **TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance.** *Proceedings of ACL 2021*, pages 3086–3101.
**Abstract (from the publication):** The paper introduces TAT-QA, with **16,552** questions over **2,757** hybrid contexts from financial reports—each context combines a table with related paragraphs. Questions require diverse **numerical reasoning**; answers may be spans or computed values. The authors propose **TAGOP** and report **58.0** F1 versus **90.8** F1 for human experts, motivating continued research on hybrid financial QA.
- **ACL Anthology:** [https://aclanthology.org/2021.acl-long.254/](https://aclanthology.org/2021.acl-long.254/)
- **arXiv:** [https://arxiv.org/abs/2105.07624](https://arxiv.org/abs/2105.07624)
- **Project page:** [https://nextplusplus.github.io/TAT-QA/](https://nextplusplus.github.io/TAT-QA/)
- **Reference HF dataset:** [next-tat/TAT-QA](https://huggingface.co/datasets/next-tat/TAT-QA)
- **Code:** [NExTplusplus/TAT-QA](https://github.com/NExTplusplus/TAT-QA)
### UDA — benchmark suite (TatHybrid / finance)
Yulong Hui, Yao Lu, Huanchen Zhang. **UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis.** *NeurIPS 2024* (Datasets and Benchmarks Track).
**Abstract (summary):** UDA aggregates thousands of documents and tens of thousands of expert Q&A pairs across finance, academia, and Wikipedia-style settings, with documents kept in **original** formats to evaluate **end-to-end** RAG and analysis—including **retrieval** and **layout** challenges.
- **arXiv:** [https://arxiv.org/abs/2406.15187](https://arxiv.org/abs/2406.15187)
- **NeurIPS proceedings:** [https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html)
- **Repository:** [https://github.com/qinchuanhui/UDA-Benchmark](https://github.com/qinchuanhui/UDA-Benchmark)
- **Related Hub aggregation:** [qinchuanhui/UDA-QA](https://huggingface.co/datasets/qinchuanhui/UDA-QA)
## Citation
If you use this dataset, cite **TAT-QA** and **UDA**, and reference this dataset record as appropriate:
```bibtex
@inproceedings{zhu-etal-2021-tat,
title = {TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance},
author = {Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng},
booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
year = {2021},
pages = {3086--3101}
}
```
```bibtex
@article{hui2024uda,
title = {UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis},
author = {Hui, Yulong and Lu, Yao and Zhang, Huanchen},
journal = {arXiv preprint arXiv:2406.15187},
year = {2024}
}
```
## License
Use this dataset in accordance with the **TAT-QA** and **UDA** licenses and terms for the underlying benchmarks. Confirm suitability for your use case (including commercial or redistribution constraints) via the original project pages and licenses.
提供机构:
orgrctera



