five

orgrctera/uda_fin_qa_qa

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/orgrctera/uda_fin_qa_qa
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en pretty_name: UDA FinQA (Question Answering) size_categories: - 5K<n<10K tags: - finance - question-answering - numerical-reasoning - unstructured-documents - task:question-answering configs: - config_name: default data_files: - split: default path: data/default-* dataset_info: features: - name: input dtype: string - name: metadata dtype: string - name: answers dtype: string - name: evidence dtype: string - name: context dtype: string - name: program dtype: string splits: - name: default num_bytes: 43214536 num_examples: 8190 download_size: 6961204 dataset_size: 43214536 --- # UDA FinQA — Question Answering (`orgrctera/uda_fin_qa_qa`) ## Dataset description This release is the **FinQA-aligned Question Answering (QA)** split from the **UDA (Unstructured Document Analysis)** benchmark: **8,190** expert-style question–answer instances grounded in real corporate financial disclosures (earnings materials, 10-K–style reports). Each row pairs a natural-language **question** with structured **supervision** (gold answers, supporting evidence, document context, and executable reasoning programs), so models can be trained or evaluated on **financial numerical reasoning** over heterogeneous evidence (narrative text and tables). **UDA** (Hui et al., NeurIPS 2024) is a suite for **Retrieval-Augmented Generation (RAG)** and LLM-based analysis on **real-world** documents kept in original, messy formats. The finance track includes subsets such as **FinHybrid** (FinQA-style), designed to stress **parsing, alignment, and reasoning**—not only fluent generation. **FinQA** (Chen et al., EMNLP 2021) is the foundational task: expert-written questions over financial reports with **multi-step numerical reasoning**, **heterogeneous evidence** (tables + text), and **explainable** annotations (including programs over quantities). This Hub dataset follows that task definition within UDA’s benchmark packaging (`sub_benchmark`: `fin_qa`). ## The task - **Task type:** **Question Answering (QA)** for **FinQA** — given the question in `input`, predict the correct **numeric or textual answer** using information that would appear in the source report (tables and surrounding text). Evaluation typically uses gold **string** and **executable** answers and may use **evidence** and **program** consistency with the official FinQA / UDA protocols. - **Input (`input`):** A single English question about reported figures, trends, or relationships (e.g., interest expense, growth rates, year-over-year comparisons). - **Target (`expected_output`):** A JSON **string** with: - **`answers`:** e.g. `str_answer` (normalized string) and `exe_answer` (numeric value where applicable). - **`evidence`:** Pointers to supporting **text** and **table** snippets (e.g. `text_1`, `table_1`). - **`context`:** Supporting **pre_text**, **post_text**, and **table** material aligned with the report excerpt. - **`program`:** A **gold reasoning program** over numbers and operations (FinQA-style), supporting interpretability and program-based metrics. - **Metadata (`metadata`):** `benchmark_name` (`uda_fin_qa`), `benchmark_type` (`uda`), `split`, `sub_benchmark` (`fin_qa`), and `value` (JSON with identifiers such as `label_key`, `label_file`, `q_uid`). **Splits:** Single split `default` with **8,190** examples (one Parquet shard: `data/default-00000-of-00001.parquet`). ## Background ### FinQA FinQA was introduced to study **numerical reasoning over financial data**: questions are written by finance professionals over real filings; annotations include operations and facts that support the answer. The authors show that strong general-domain LMs still **trail experts** on finance-specific knowledge and **multi-step** numeric reasoning. FinQA remains a standard benchmark for **table+text** reasoning in finance. ### UDA benchmark UDA aggregates **thousands** of real documents and **tens of thousands** of annotated Q&A pairs across domains (including finance), with documents provided in ways that reflect **realistic** ingestion (e.g., PDF/HTML) so that **retrieval, chunking, and parsing** choices matter. The FinQA-related finance portion (FinHybrid / FinQA track) matches the scale of this dataset (**8,190** QA instances). ## References ### FinQA (source task and annotations) **Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang.** *FinQA: A Dataset of Numerical Reasoning over Financial Data.* **EMNLP 2021**, pages 3697–3711. - **Abstract (summary):** The paper introduces a large-scale dataset of QA pairs over financial reports with **gold reasoning programs** and evidence for explainability; experiments show pretrained language models **fall short of human experts** on financial knowledge and complex numerical reasoning, motivating better structured and unstructured reasoning over filings. - **ACL Anthology:** [https://aclanthology.org/2021.emnlp-main.300/](https://aclanthology.org/2021.emnlp-main.300/) - **DOI:** [10.18653/v1/2021.emnlp-main.300](https://doi.org/10.18653/v1/2021.emnlp-main.300) - **Original code and data:** [https://github.com/czyssrs/FinQA](https://github.com/czyssrs/FinQA) - **Project site:** [https://finqasite.github.io/](https://finqasite.github.io/) ### UDA (benchmark suite containing this FinQA slice) **Yulong Hui, Yao Lu, Huanchen Zhang.** *UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis.* **NeurIPS 2024** (Datasets and Benchmarks Track). - **Abstract (summary):** UDA provides a benchmark for **RAG** in **real-world document analysis** with diverse domains and question types, using real documents and expert annotations; the work analyzes how **parsing and retrieval** interact with generation and highlights practical design choices for document AI systems. - **arXiv:** [https://arxiv.org/abs/2406.15187](https://arxiv.org/abs/2406.15187) - **NeurIPS proceedings:** [https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html) - **Code:** [https://github.com/qinchuanhui/UDA-Benchmark](https://github.com/qinchuanhui/UDA-Benchmark) ### Related Hub resources - Aggregated UDA QA reference: [qinchuanhui/UDA-QA](https://huggingface.co/datasets/qinchuanhui/UDA-QA) ## Examples Illustrative rows from the dataset; long `context` blocks are abbreviated. ### Example 1 — interest expense **`input`:** ```text what is the the interest expense in 2009? ``` **`expected_output` (excerpt; `context` shortened):** ```json { "answers": { "str_answer": "380", "exe_answer": 3.8 }, "evidence": { "text_1": "if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million ." }, "context": { "pre_text": [ "interest rate to a variable interest rate based on the three-month libor plus 2.05% ...", "if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million .", "..." ], "post_text": ["..."] }, "program": "divide(100, 100), divide(3.8, #0)" } ``` ### Example 2 — amortization growth **`input`:** ```text what is the expected growth rate in amortization expense in 2010? ``` **`expected_output` (excerpt):** ```json { "answers": { "str_answer": "-27.0%", "exe_answer": -0.26689 }, "evidence": { "table_1": "fiscal years the 2010 of amortization expense is $ 5425 ;", "text_2": "amortization expense from continuing operations , related to intangibles was $ 7.4 million , $ 9.3 million and $ 9.2 million in fiscal 2009 , 2008 and 2007 , respectively ." }, "context": { "...": "..." }, "program": "subtract(1074.5, 1110.6), divide(#0, 1110.6)" } ``` **`metadata.value` (example):** ```json { "label_key": "ADI_2009", "label_file": "fin_qa", "q_uid": "ADI/2009/page_49.pdf-1" } ``` ## Citation If you use this dataset, please cite **FinQA**, **UDA**, and this dataset record as appropriate: ```bibtex @inproceedings{chen-etal-2021-finqa, title = {FinQA: A Dataset of Numerical Reasoning over Financial Data}, author = {Chen, Zhiyu and Chen, Wenhu and Smiley, Charese and Shah, Sameena and Borova, Iana and Langdon, Dylan and Moussa, Reema and Beane, Matt and Huang, Ting-Hao and Routledge, Bryan and Wang, William Yang}, booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing}, year = {2021}, pages = {3697--3711} } ``` ```bibtex @inproceedings{hui2024uda, title = {UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis}, author = {Hui, Yulong and Lu, Yao and Zhang, Huanchen}, booktitle = {Advances in Neural Information Processing Systems}, year = {2024} } ``` ## License The original FinQA release is under the **MIT License** (see the [FinQA repository](https://github.com/czyssrs/FinQA)). Use this dataset in compliance with the original data licenses and the UDA benchmark terms.
提供机构:
orgrctera
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作