five

next-tat/TAT-QA

收藏
Hugging Face2024-10-11 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/next-tat/TAT-QA
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - question-answering language: - en tags: - finance - table-text - discrete_reasoning - numerical_reasoning size_categories: - 10K<n<100K --- # TAT-QA - [**Project Page**](https://nextplusplus.github.io/TAT-QA/) - [**Paper - ACL 21**](https://aclanthology.org/2021.acl-long.254/) - [**Paper - Arxiv**](https://arxiv.org/abs/2105.07624) - [**Source Code**](https://github.com/NExTplusplus/TAT-QA) - [**Leaderboard**](https://nextplusplus.github.io/TAT-QA/#leaderboard) TAT-QA (Tabular And Textual dataset for Question Answering) is a large-scale QA dataset, aiming to stimulate progress of QA research over more complex and realistic tabular and textual data, especially those requiring numerical reasoning. The unique features of TAT-QA include: - The context given is hybrid, comprising a semi-structured table and at least two relevant paragraphs that describe, analyze or complement the table; - The questions are generated by the humans with rich financial knowledge, most are practical; - The answer forms are diverse, including single span, multiple spans and free-form; - To answer the questions, various numerical reasoning capabilities are usually required, including addition (+), subtraction (-), multiplication (x), division (/), counting, comparison, sorting, and their compositions; - In addition to the ground-truth answers, the corresponding derivations and scale are also provided if any. In total, TAT-QA contains 16,552 questions associated with 2,757 hybrid contexts from real-world financial reports. For more details, please refer to the project page: https://nextplusplus.github.io/TAT-QA/ ## Data Format ```phthon { "table": { # The tabular data in a hybrid context "uid": "3ffd9053-a45d-491c-957a-1b2fa0af0570", # The unique id of a table "table": [ # The table content which is 2d-array [ "", "2019", "2018", "2017" ], [ "Fixed Price", "$ 1,452.4", "$ 1,146.2", "$ 1,036.9" ], ... ] }, "paragraphs": [ # The textual data in a hybrid context comprising at least two associated paragraphs to the table { "uid": "f4ac7069-10a2-47e9-995c-3903293b3d47", # The unique id of a paragraph "order": 1, # The order of the paragraph in all associated paragraphs, starting from 1 "text": "Sales by Contract Type: Substantially all of # The content of the paragraph our contracts are fixed-price type contracts. Sales included in Other contract types represent cost plus and time and material type contracts." }, ... ], "questions": [ # The questions associated to the hybrid context { "uid": "eb787966-fa02-401f-bfaf-ccabf3828b23", # The unique id of a question "order": 2, # The order of the question in all questions, starting from 1 "question": "What is the change in Other in 2019 from 2018?", # The question itself "answer": -12.6, # The ground-truth answer "derivation": "44.1 - 56.7", # The derivation that can be executed to arrive at the ground-truth answer "answer_type": "arithmetic", # The answer type including `span`, `spans`, `arithmetic` and `counting`. "answer_from": "table-text", # The source of the answer including `table`, `table` and `table-text` "rel_paragraphs": [ # The orders of the paragraphs that are relied to infer the answer if any. "2" ], "req_comparison": false, # A flag indicating if `comparison/sorting` is needed to answer the question whose answer is a single span or multiple spans "scale": "million" # The scale of the answer including `None`, `thousand`, `million`, `billion` and `percent` } ] } ``` ## Citation ```bash @inproceedings{zhu2021tat, title = "{TAT}-{QA}: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance", author = "Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.254", doi = "10.18653/v1/2021.acl-long.254", pages = "3277--3287" } ```
提供机构:
next-tat
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作