orgrctera/uda_fin_qa_qa
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/orgrctera/uda_fin_qa_qa
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
pretty_name: UDA FinQA (Question Answering)
size_categories:
- 5K<n<10K
tags:
- finance
- question-answering
- numerical-reasoning
- unstructured-documents
- task:question-answering
configs:
- config_name: default
data_files:
- split: default
path: data/default-*
dataset_info:
features:
- name: input
dtype: string
- name: metadata
dtype: string
- name: answers
dtype: string
- name: evidence
dtype: string
- name: context
dtype: string
- name: program
dtype: string
splits:
- name: default
num_bytes: 43214536
num_examples: 8190
download_size: 6961204
dataset_size: 43214536
---
# UDA FinQA — Question Answering (`orgrctera/uda_fin_qa_qa`)
## Dataset description
This release is the **FinQA-aligned Question Answering (QA)** split from the **UDA (Unstructured Document Analysis)** benchmark: **8,190** expert-style question–answer instances grounded in real corporate financial disclosures (earnings materials, 10-K–style reports). Each row pairs a natural-language **question** with structured **supervision** (gold answers, supporting evidence, document context, and executable reasoning programs), so models can be trained or evaluated on **financial numerical reasoning** over heterogeneous evidence (narrative text and tables).
**UDA** (Hui et al., NeurIPS 2024) is a suite for **Retrieval-Augmented Generation (RAG)** and LLM-based analysis on **real-world** documents kept in original, messy formats. The finance track includes subsets such as **FinHybrid** (FinQA-style), designed to stress **parsing, alignment, and reasoning**—not only fluent generation.
**FinQA** (Chen et al., EMNLP 2021) is the foundational task: expert-written questions over financial reports with **multi-step numerical reasoning**, **heterogeneous evidence** (tables + text), and **explainable** annotations (including programs over quantities). This Hub dataset follows that task definition within UDA’s benchmark packaging (`sub_benchmark`: `fin_qa`).
## The task
- **Task type:** **Question Answering (QA)** for **FinQA** — given the question in `input`, predict the correct **numeric or textual answer** using information that would appear in the source report (tables and surrounding text). Evaluation typically uses gold **string** and **executable** answers and may use **evidence** and **program** consistency with the official FinQA / UDA protocols.
- **Input (`input`):** A single English question about reported figures, trends, or relationships (e.g., interest expense, growth rates, year-over-year comparisons).
- **Target (`expected_output`):** A JSON **string** with:
- **`answers`:** e.g. `str_answer` (normalized string) and `exe_answer` (numeric value where applicable).
- **`evidence`:** Pointers to supporting **text** and **table** snippets (e.g. `text_1`, `table_1`).
- **`context`:** Supporting **pre_text**, **post_text**, and **table** material aligned with the report excerpt.
- **`program`:** A **gold reasoning program** over numbers and operations (FinQA-style), supporting interpretability and program-based metrics.
- **Metadata (`metadata`):** `benchmark_name` (`uda_fin_qa`), `benchmark_type` (`uda`), `split`, `sub_benchmark` (`fin_qa`), and `value` (JSON with identifiers such as `label_key`, `label_file`, `q_uid`).
**Splits:** Single split `default` with **8,190** examples (one Parquet shard: `data/default-00000-of-00001.parquet`).
## Background
### FinQA
FinQA was introduced to study **numerical reasoning over financial data**: questions are written by finance professionals over real filings; annotations include operations and facts that support the answer. The authors show that strong general-domain LMs still **trail experts** on finance-specific knowledge and **multi-step** numeric reasoning. FinQA remains a standard benchmark for **table+text** reasoning in finance.
### UDA benchmark
UDA aggregates **thousands** of real documents and **tens of thousands** of annotated Q&A pairs across domains (including finance), with documents provided in ways that reflect **realistic** ingestion (e.g., PDF/HTML) so that **retrieval, chunking, and parsing** choices matter. The FinQA-related finance portion (FinHybrid / FinQA track) matches the scale of this dataset (**8,190** QA instances).
## References
### FinQA (source task and annotations)
**Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang.** *FinQA: A Dataset of Numerical Reasoning over Financial Data.* **EMNLP 2021**, pages 3697–3711.
- **Abstract (summary):** The paper introduces a large-scale dataset of QA pairs over financial reports with **gold reasoning programs** and evidence for explainability; experiments show pretrained language models **fall short of human experts** on financial knowledge and complex numerical reasoning, motivating better structured and unstructured reasoning over filings.
- **ACL Anthology:** [https://aclanthology.org/2021.emnlp-main.300/](https://aclanthology.org/2021.emnlp-main.300/)
- **DOI:** [10.18653/v1/2021.emnlp-main.300](https://doi.org/10.18653/v1/2021.emnlp-main.300)
- **Original code and data:** [https://github.com/czyssrs/FinQA](https://github.com/czyssrs/FinQA)
- **Project site:** [https://finqasite.github.io/](https://finqasite.github.io/)
### UDA (benchmark suite containing this FinQA slice)
**Yulong Hui, Yao Lu, Huanchen Zhang.** *UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis.* **NeurIPS 2024** (Datasets and Benchmarks Track).
- **Abstract (summary):** UDA provides a benchmark for **RAG** in **real-world document analysis** with diverse domains and question types, using real documents and expert annotations; the work analyzes how **parsing and retrieval** interact with generation and highlights practical design choices for document AI systems.
- **arXiv:** [https://arxiv.org/abs/2406.15187](https://arxiv.org/abs/2406.15187)
- **NeurIPS proceedings:** [https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html)
- **Code:** [https://github.com/qinchuanhui/UDA-Benchmark](https://github.com/qinchuanhui/UDA-Benchmark)
### Related Hub resources
- Aggregated UDA QA reference: [qinchuanhui/UDA-QA](https://huggingface.co/datasets/qinchuanhui/UDA-QA)
## Examples
Illustrative rows from the dataset; long `context` blocks are abbreviated.
### Example 1 — interest expense
**`input`:**
```text
what is the the interest expense in 2009?
```
**`expected_output` (excerpt; `context` shortened):**
```json
{
"answers": {
"str_answer": "380",
"exe_answer": 3.8
},
"evidence": {
"text_1": "if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million ."
},
"context": {
"pre_text": [
"interest rate to a variable interest rate based on the three-month libor plus 2.05% ...",
"if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million .",
"..."
],
"post_text": ["..."]
},
"program": "divide(100, 100), divide(3.8, #0)"
}
```
### Example 2 — amortization growth
**`input`:**
```text
what is the expected growth rate in amortization expense in 2010?
```
**`expected_output` (excerpt):**
```json
{
"answers": {
"str_answer": "-27.0%",
"exe_answer": -0.26689
},
"evidence": {
"table_1": "fiscal years the 2010 of amortization expense is $ 5425 ;",
"text_2": "amortization expense from continuing operations , related to intangibles was $ 7.4 million , $ 9.3 million and $ 9.2 million in fiscal 2009 , 2008 and 2007 , respectively ."
},
"context": { "...": "..." },
"program": "subtract(1074.5, 1110.6), divide(#0, 1110.6)"
}
```
**`metadata.value` (example):**
```json
{
"label_key": "ADI_2009",
"label_file": "fin_qa",
"q_uid": "ADI/2009/page_49.pdf-1"
}
```
## Citation
If you use this dataset, please cite **FinQA**, **UDA**, and this dataset record as appropriate:
```bibtex
@inproceedings{chen-etal-2021-finqa,
title = {FinQA: A Dataset of Numerical Reasoning over Financial Data},
author = {Chen, Zhiyu and Chen, Wenhu and Smiley, Charese and Shah, Sameena and Borova, Iana and Langdon, Dylan and Moussa, Reema and Beane, Matt and Huang, Ting-Hao and Routledge, Bryan and Wang, William Yang},
booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
year = {2021},
pages = {3697--3711}
}
```
```bibtex
@inproceedings{hui2024uda,
title = {UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis},
author = {Hui, Yulong and Lu, Yao and Zhang, Huanchen},
booktitle = {Advances in Neural Information Processing Systems},
year = {2024}
}
```
## License
The original FinQA release is under the **MIT License** (see the [FinQA repository](https://github.com/czyssrs/FinQA)). Use this dataset in compliance with the original data licenses and the UDA benchmark terms.
提供机构:
orgrctera



