orgrctera/uda_fin_qa
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/orgrctera/uda_fin_qa
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
pretty_name: UDA FinQA (Retrieval)
size_categories:
- 5K<n<10K
tags:
- finance
- question-answering
- retrieval
- rag
- unstructured-documents
configs:
- config_name: default
data_files:
- split: default
path: data/default-*
dataset_info:
features:
- name: input
dtype: string
- name: metadata
dtype: string
- name: answers
dtype: string
- name: evidence
dtype: string
- name: context
dtype: string
- name: program
dtype: string
splits:
- name: default
num_bytes: 43214536
num_examples: 8190
download_size: 6961204
dataset_size: 43214536
---
# UDA FinQA (`orgrctera/uda_fin_qa`)
## Overview
This dataset is the **FinQA** slice of the **UDA (Unstructured Document Analysis)** benchmark: **8,190** question–answer instances derived from real financial reports, packaged for **retrieval-oriented** evaluation in RAG pipelines.
**UDA** is a benchmark suite for Retrieval-Augmented Generation (RAG) over messy, real-world documents (PDF/HTML) where evidence mixes narrative text and tables. The finance portion includes large subsets aligned with **FinQA**-style numerical reasoning over earnings materials.
**FinQA** (Chen et al., EMNLP 2021) is the foundational dataset of expert-written questions over financial reports, with heterogeneous evidence (tables + text) and multi-step numerical reasoning. UDA adopts FinQA-aligned labeling within its broader document-analysis benchmark (Hui et al., NeurIPS 2024 Datasets & Benchmarks).
In this Hub release, each row is a **retrieval task instance**: the model must locate and use the right evidence (typically embedded in `expected_output` as structured context for scoring or teacher forcing) to answer the question in `input`, consistent with the **FinQA / UDA** evaluation setting where **retrieval quality** and parsing of unstructured financial documents are central.
## Task
- **Task type:** Retrieval (within a RAG / document-analysis pipeline) for **FinQA**-style financial QA.
- **Input:** A natural-language question (`input`) about reported figures or relationships in corporate financial disclosures.
- **Supervision / reference:** `expected_output` is a JSON string containing gold answers, evidence pointers, and document context (see below). Metadata records UDA benchmark identifiers (`sub_benchmark`: `fin_qa`).
Models are typically evaluated by whether retrieved passages support the correct answer and whether generation matches gold reasoning or numeric targets, following the FinQA and UDA protocols.
## Background
### FinQA
FinQA targets **numerical reasoning over financial data**: questions are written by finance experts over real reports; annotations include explainable reasoning traces. The original work shows that general-domain pretrained LMs lag humans on finance-specific, multi-step numeric reasoning.
### UDA benchmark
UDA revisits RAG and LLM-based document analysis across domains (including finance) using **thousands of real documents** and **tens of thousands** of expert-annotated Q&A pairs, with documents kept in original formats to stress **parsing, chunking, and retrieval**—not only generation.
The **FinQA-related** finance split in UDA (reported in the UDA paper as part of the finance track) corresponds to the scale of this dataset (**8,190** examples in the `default` split here).
## Data fields
| Column | Type | Description |
|--------|------|-------------|
| `input` | `string` | Question text posed over the report. |
| `expected_output` | `string` | JSON string with fields such as `answers` (e.g. `str_answer`, `exe_answer`), `evidence` (table/text references), and `context` (supporting pre/post text and table snippets). |
| `metadata` | struct | `benchmark_name` (`uda_fin_qa`), `benchmark_type` (`uda`), `split`, `sub_benchmark` (`fin_qa`), and `value` (JSON string with identifiers like `label_key`, `label_file`, `q_uid`). |
**Splits:** Single split `default` with **8,190** examples.
## Examples
Example rows are illustrative; long `context` blocks are abbreviated.
**Example 1 — interest expense**
- **`input`:** `what is the the interest expense in 2009?`
- **`expected_output` (excerpt):**
```json
{
"answers": {"str_answer": "380", "exe_answer": 3.8},
"evidence": {
"text_1": "if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million ."
},
"context": {
"pre_text": [
"interest rate to a variable interest rate based on the three-month libor plus 2.05% ...",
"if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million .",
"..."
]
}
}
```
**Example 2 — amortization growth**
- **`input`:** `what is the expected growth rate in amortization expense in 2010?`
- **`expected_output` (excerpt):**
```json
{
"answers": {"str_answer": "-27.0%", "exe_answer": -0.26689},
"evidence": {
"table_1": "fiscal years the 2010 of amortization expense is $ 5425 ;",
"text_2": "amortization expense from continuing operations , related to intangibles was $ 7.4 million , $ 9.3 million and $ 9.2 million in fiscal 2009 , 2008 and 2007 , respectively ."
},
"context": { "...": "..." }
}
```
**`metadata.value` (example):**
```json
{"label_key": "ADI_2009", "label_file": "fin_qa", "q_uid": "ADI/2009/page_49.pdf-1"}
```
## References
### FinQA (source task & data lineage)
Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang. **FinQA: A Dataset of Numerical Reasoning over Financial Data.** *EMNLP 2021*, pages 3697–3711.
- **Abstract (short):** Introduces a large-scale dataset of QA pairs over financial reports with gold reasoning programs for explainability; shows that large pretrained models fall short of experts on finance knowledge and multi-step numerical reasoning.
- **ACL Anthology:** [https://aclanthology.org/2021.emnlp-main.300/](https://aclanthology.org/2021.emnlp-main.300/)
- **DOI:** [10.18653/v1/2021.emnlp-main.300](https://doi.org/10.18653/v1/2021.emnlp-main.300)
- **Code & data (original release):** [https://github.com/czyssrs/FinQA](https://github.com/czyssrs/FinQA)
### UDA benchmark (suite containing this FinQA slice)
Yulong Hui, Yao Lu, Huanchen Zhang. **UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis.** *NeurIPS 2024* (Datasets and Benchmarks Track).
- **Abstract (short):** Presents UDA with thousands of real-world documents and tens of thousands of expert-annotated Q&A pairs; evaluates LLM- and RAG-based document analysis and highlights parsing and retrieval design choices.
- **arXiv:** [https://arxiv.org/abs/2406.15187](https://arxiv.org/abs/2406.15187)
- **arXiv DOI:** [10.48550/arXiv.2406.15187](https://doi.org/10.48550/arXiv.2406.15187)
- **NeurIPS proceedings:** [https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html)
- **Code & resources:** [https://github.com/qinchuanhui/UDA-Benchmark](https://github.com/qinchuanhui/UDA-Benchmark)
### Related Hub resources
- UDA QA aggregation (reference): [qinchuanhui/UDA-QA](https://huggingface.co/datasets/qinchuanhui/UDA-QA)
## Citation
If you use this dataset, please cite **both** FinQA and UDA (and this dataset record as appropriate):
```bibtex
@inproceedings{chen-etal-2021-finqa,
title = {FinQA: A Dataset of Numerical Reasoning over Financial Data},
author = {Chen, Zhiyu and Chen, Wenhu and Smiley, Charese and Shah, Sameena and Borova, Iana and Langdon, Dylan and Moussa, Reema and Beane, Matt and Huang, Ting-Hao and Routledge, Bryan and Wang, William Yang},
booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
year = {2021},
pages = {3697--3711}
}
```
```bibtex
@article{hui2024uda,
title = {UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis},
author = {Hui, Yulong and Lu, Yao and Zhang, Huanchen},
journal = {arXiv preprint arXiv:2406.15187},
year = {2024}
}
```
## License
The original FinQA release is under the **MIT License** (see the [FinQA repository](https://github.com/czyssrs/FinQA)). Use this dataset in compliance with the original data licenses and the UDA benchmark terms.
提供机构:
orgrctera



