five

orgrctera/uda_fin_qa

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/orgrctera/uda_fin_qa
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en pretty_name: UDA FinQA (Retrieval) size_categories: - 5K<n<10K tags: - finance - question-answering - retrieval - rag - unstructured-documents configs: - config_name: default data_files: - split: default path: data/default-* dataset_info: features: - name: input dtype: string - name: metadata dtype: string - name: answers dtype: string - name: evidence dtype: string - name: context dtype: string - name: program dtype: string splits: - name: default num_bytes: 43214536 num_examples: 8190 download_size: 6961204 dataset_size: 43214536 --- # UDA FinQA (`orgrctera/uda_fin_qa`) ## Overview This dataset is the **FinQA** slice of the **UDA (Unstructured Document Analysis)** benchmark: **8,190** question–answer instances derived from real financial reports, packaged for **retrieval-oriented** evaluation in RAG pipelines. **UDA** is a benchmark suite for Retrieval-Augmented Generation (RAG) over messy, real-world documents (PDF/HTML) where evidence mixes narrative text and tables. The finance portion includes large subsets aligned with **FinQA**-style numerical reasoning over earnings materials. **FinQA** (Chen et al., EMNLP 2021) is the foundational dataset of expert-written questions over financial reports, with heterogeneous evidence (tables + text) and multi-step numerical reasoning. UDA adopts FinQA-aligned labeling within its broader document-analysis benchmark (Hui et al., NeurIPS 2024 Datasets & Benchmarks). In this Hub release, each row is a **retrieval task instance**: the model must locate and use the right evidence (typically embedded in `expected_output` as structured context for scoring or teacher forcing) to answer the question in `input`, consistent with the **FinQA / UDA** evaluation setting where **retrieval quality** and parsing of unstructured financial documents are central. ## Task - **Task type:** Retrieval (within a RAG / document-analysis pipeline) for **FinQA**-style financial QA. - **Input:** A natural-language question (`input`) about reported figures or relationships in corporate financial disclosures. - **Supervision / reference:** `expected_output` is a JSON string containing gold answers, evidence pointers, and document context (see below). Metadata records UDA benchmark identifiers (`sub_benchmark`: `fin_qa`). Models are typically evaluated by whether retrieved passages support the correct answer and whether generation matches gold reasoning or numeric targets, following the FinQA and UDA protocols. ## Background ### FinQA FinQA targets **numerical reasoning over financial data**: questions are written by finance experts over real reports; annotations include explainable reasoning traces. The original work shows that general-domain pretrained LMs lag humans on finance-specific, multi-step numeric reasoning. ### UDA benchmark UDA revisits RAG and LLM-based document analysis across domains (including finance) using **thousands of real documents** and **tens of thousands** of expert-annotated Q&A pairs, with documents kept in original formats to stress **parsing, chunking, and retrieval**—not only generation. The **FinQA-related** finance split in UDA (reported in the UDA paper as part of the finance track) corresponds to the scale of this dataset (**8,190** examples in the `default` split here). ## Data fields | Column | Type | Description | |--------|------|-------------| | `input` | `string` | Question text posed over the report. | | `expected_output` | `string` | JSON string with fields such as `answers` (e.g. `str_answer`, `exe_answer`), `evidence` (table/text references), and `context` (supporting pre/post text and table snippets). | | `metadata` | struct | `benchmark_name` (`uda_fin_qa`), `benchmark_type` (`uda`), `split`, `sub_benchmark` (`fin_qa`), and `value` (JSON string with identifiers like `label_key`, `label_file`, `q_uid`). | **Splits:** Single split `default` with **8,190** examples. ## Examples Example rows are illustrative; long `context` blocks are abbreviated. **Example 1 — interest expense** - **`input`:** `what is the the interest expense in 2009?` - **`expected_output` (excerpt):** ```json { "answers": {"str_answer": "380", "exe_answer": 3.8}, "evidence": { "text_1": "if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million ." }, "context": { "pre_text": [ "interest rate to a variable interest rate based on the three-month libor plus 2.05% ...", "if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million .", "..." ] } } ``` **Example 2 — amortization growth** - **`input`:** `what is the expected growth rate in amortization expense in 2010?` - **`expected_output` (excerpt):** ```json { "answers": {"str_answer": "-27.0%", "exe_answer": -0.26689}, "evidence": { "table_1": "fiscal years the 2010 of amortization expense is $ 5425 ;", "text_2": "amortization expense from continuing operations , related to intangibles was $ 7.4 million , $ 9.3 million and $ 9.2 million in fiscal 2009 , 2008 and 2007 , respectively ." }, "context": { "...": "..." } } ``` **`metadata.value` (example):** ```json {"label_key": "ADI_2009", "label_file": "fin_qa", "q_uid": "ADI/2009/page_49.pdf-1"} ``` ## References ### FinQA (source task & data lineage) Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang. **FinQA: A Dataset of Numerical Reasoning over Financial Data.** *EMNLP 2021*, pages 3697–3711. - **Abstract (short):** Introduces a large-scale dataset of QA pairs over financial reports with gold reasoning programs for explainability; shows that large pretrained models fall short of experts on finance knowledge and multi-step numerical reasoning. - **ACL Anthology:** [https://aclanthology.org/2021.emnlp-main.300/](https://aclanthology.org/2021.emnlp-main.300/) - **DOI:** [10.18653/v1/2021.emnlp-main.300](https://doi.org/10.18653/v1/2021.emnlp-main.300) - **Code & data (original release):** [https://github.com/czyssrs/FinQA](https://github.com/czyssrs/FinQA) ### UDA benchmark (suite containing this FinQA slice) Yulong Hui, Yao Lu, Huanchen Zhang. **UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis.** *NeurIPS 2024* (Datasets and Benchmarks Track). - **Abstract (short):** Presents UDA with thousands of real-world documents and tens of thousands of expert-annotated Q&A pairs; evaluates LLM- and RAG-based document analysis and highlights parsing and retrieval design choices. - **arXiv:** [https://arxiv.org/abs/2406.15187](https://arxiv.org/abs/2406.15187) - **arXiv DOI:** [10.48550/arXiv.2406.15187](https://doi.org/10.48550/arXiv.2406.15187) - **NeurIPS proceedings:** [https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/7c06759d1a8567f087b02e8589454917-Abstract-Datasets_and_Benchmarks_Track.html) - **Code & resources:** [https://github.com/qinchuanhui/UDA-Benchmark](https://github.com/qinchuanhui/UDA-Benchmark) ### Related Hub resources - UDA QA aggregation (reference): [qinchuanhui/UDA-QA](https://huggingface.co/datasets/qinchuanhui/UDA-QA) ## Citation If you use this dataset, please cite **both** FinQA and UDA (and this dataset record as appropriate): ```bibtex @inproceedings{chen-etal-2021-finqa, title = {FinQA: A Dataset of Numerical Reasoning over Financial Data}, author = {Chen, Zhiyu and Chen, Wenhu and Smiley, Charese and Shah, Sameena and Borova, Iana and Langdon, Dylan and Moussa, Reema and Beane, Matt and Huang, Ting-Hao and Routledge, Bryan and Wang, William Yang}, booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing}, year = {2021}, pages = {3697--3711} } ``` ```bibtex @article{hui2024uda, title = {UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis}, author = {Hui, Yulong and Lu, Yao and Zhang, Huanchen}, journal = {arXiv preprint arXiv:2406.15187}, year = {2024} } ``` ## License The original FinQA release is under the **MIT License** (see the [FinQA repository](https://github.com/czyssrs/FinQA)). Use this dataset in compliance with the original data licenses and the UDA benchmark terms.
提供机构:
orgrctera
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作