next-tat/TAT-DQA
收藏Hugging Face2024-10-11 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/next-tat/TAT-DQA
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- question-answering
language:
- en
tags:
- finance
- table-text
- visual-document-QA
- numerical-reasoning
size_categories:
- 10K<n<100K
---
# TAT-DQA
- [**Project Page**](https://nextplusplus.github.io/TAT-DQA/)
- [**Paper - MM 22**](https://dl.acm.org/doi/abs/10.1145/3503161.3548422)
- [**Paper - Arxiv**](https://arxiv.org/abs/2207.11871)
- [**Github**](https://github.com/NExTplusplus/TAT-DQA)
- [**Leaderboard**](https://nextplusplus.github.io/TAT-DQA/#leaderboard)
**TAT-DQA** is a large-scale Document VQA dataset, which is constructed by extending the TAT-QA. It aims to stimulate the progress of QA research over more complex and realistic **visually-rich documents** with rich tabular and textual content, especially those requiring numerical reasoning.
The unique features of TAT-DQA include:
- The documents in TAT-DQA dataset are sampled from real-world high-quality financial reports and each document contains both tabular and textual data;
- The average number of words of each document in TAT-DQA is around 550, which is significantly larger than all existing Document VQA datasets.
- Around 85% of the documents in the dataset have only one page while 15% has multiple pages.
- Similar to TAT-QA, the answer forms are diverse, including single span, multiple spans and free-form and various numerical reasoning capabilities are usually required, including addition (+), subtraction (-), multiplication (x), division (/), counting, comparison, sorting, and their compositions;
In total, TAT-DQA contains 16,558 questions associated with 2,758 documents ( 3,067 document pages ) sampled from real-world financial reports.
## Citation
```python
@inproceedings{zhu2022towards,
title={Towards complex document understanding by discrete reasoning},
author={Zhu, Fengbin and Lei, Wenqiang and Feng, Fuli and Wang, Chao and Zhang, Haozhou and Chua, Tat-Seng},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
pages={4857--4866},
year={2022}
}
```
许可证:CC BY 4.0
任务类别:
- 问答
语言:
- 英语
标签:
- 金融
- 表格-文本
- 视觉文档问答(visual-document-QA)
- 数值推理(numerical-reasoning)
规模类别:
- 10K < n < 100K
# TAT-DQA
- [**项目页面**](https://nextplusplus.github.io/TAT-DQA/)
- [**ACM MM 2022论文**](https://dl.acm.org/doi/abs/10.1145/3503161.3548422)
- [**Arxiv预印本**](https://arxiv.org/abs/2207.11871)
- [**GitHub仓库**](https://github.com/NExTplusplus/TAT-DQA)
- [**排行榜**](https://nextplusplus.github.io/TAT-DQA/#leaderboard)
**TAT-DQA**是一款大规模文档视觉问答(Document VQA)数据集,由TAT-QA拓展构建而成。其研发目标是推动针对更复杂且贴近真实场景的富含视觉元素的文档的问答研究,这类文档兼具丰富的表格与文本内容,尤其适用于需要数值推理能力的任务场景。
TAT-DQA的独特特性包括:
- 数据集内的所有文档均采样自真实世界的高质量金融报告,且每份文档同时包含表格与文本数据;
- TAT-DQA中每份文档的平均词数约为550,远高于现有所有文档视觉问答数据集;
- 约85%的数据集文档仅包含单页,剩余15%为多页文档;
- 与TAT-QA类似,该数据集的答案形式多样,涵盖单一跨度、多跨度以及自由格式答案,且通常需要运用多种数值推理能力,包括加(+)、减(-)、乘(×)、除(÷)运算、计数、比较、排序及其组合应用;
数据集总览:TAT-DQA共包含16558个问题,关联2758份文档(共计3067个文档页面),所有数据均采样自真实金融报告。
## 引用
python
@inproceedings{zhu2022towards,
title={基于离散推理的复杂文档理解},
author={Zhu, Fengbin and Lei, Wenqiang and Feng, Fuli and Wang, Chao and Zhang, Haozhou and Chua, Tat-Seng},
booktitle={第30届ACM国际多媒体大会论文集},
pages={4857--4866},
year={2022}
}
提供机构:
next-tat



