tatdqa_test

Name: tatdqa_test
Creator: maas
Published: 2026-01-06 16:34:39
License: 暂无描述

魔搭社区2026-01-06 更新2025-06-07 收录

下载链接：

https://modelscope.cn/datasets/vidore/tatdqa_test

下载链接

链接失效反馈

官方服务：

资源简介：

## Dataset Description This is the test set taken from the [TAT-DQA dataset](https://nextplusplus.github.io/TAT-DQA/). TAT-DQA is a large-scale Document VQA dataset that was constructed from publicly available real-world financial reports. It focuses on rich tabular and textual content requiring numerical reasoning. Questions and answers were manually annotated by human experts in finance. Example of data (see viewer) ### Data Curation Unlike other 'academic' datasets, we kept the full test set as this dataset closely represents our use case of document retrieval. There are 1,663 image-query pairs. ### Load the dataset ```python from datasets import load_dataset ds = load_dataset("vidore/tatdqa_test", split="test") ``` ### Dataset Structure Here is an example of a dataset instance structure: ```json features: - name: questionId dtype: string - name: query dtype: string - name: question_types dtype: 'null' - name: image dtype: image - name: docId dtype: int64 - name: image_filename dtype: string - name: page dtype: string - name: answer dtype: 'null' - name: data_split dtype: string - name: source dtype: string ``` ## Citation Information If you use this dataset in your research, please cite the original dataset as follows: ```latex @inproceedings{zhu-etal-2021-tat, title = "{TAT}-{QA}: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance", author = "Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.254", doi = "10.18653/v1/2021.acl-long.254", pages = "3277--3287" } @inproceedings{zhu2022towards, title={Towards complex document understanding by discrete reasoning}, author={Zhu, Fengbin and Lei, Wenqiang and Feng, Fuli and Wang, Chao and Zhang, Haozhou and Chua, Tat-Seng}, booktitle={Proceedings of the 30th ACM International Conference on Multimedia}, pages={4857--4866}, year={2022} } ```

## 数据集说明本测试集取自[TAT-DQA数据集(TAT-DQA dataset)](https://nextplusplus.github.io/TAT-DQA/)。TAT-DQA是一个大规模文档视觉问答（Document VQA）数据集，其构建素材均来自公开可得的真实金融报告。该数据集聚焦于包含丰富表格与文本内容、需要数值推理的任务场景，问答对均由金融领域的人类专家手动标注完成。数据示例（详见查看器） ### 数据集整理与其他"学术类"数据集不同，我们保留了完整的测试集，因该数据集高度贴合我们的文档检索应用场景。该数据集共包含1663组图像-查询对。 ### 加载数据集 python from datasets import load_dataset ds = load_dataset("vidore/tatdqa_test", split="test") ### 数据集结构以下为数据集实例的结构示例： json features: - name: questionId dtype: string - name: query dtype: string - name: question_types dtype: 'null' - name: image dtype: image - name: docId dtype: int64 - name: image_filename dtype: string - name: page dtype: string - name: answer dtype: 'null' - name: data_split dtype: string - name: source dtype: string ## 引用信息若您在研究中使用该数据集，请按以下方式引用原始数据集： latex @inproceedings{zhu-etal-2021-tat, title = "{TAT}-{QA}: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance", author = "Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.254", doi = "10.18653/v1/2021.acl-long.254", pages = "3277--3287" } @inproceedings{zhu2022towards, title={Towards complex document understanding by discrete reasoning}, author={Zhu, Fengbin and Lei, Wenqiang and Feng, Fuli and Wang, Chao and Zhang, Haozhou and Chua, Tat-Seng}, booktitle={Proceedings of the 30th ACM International Conference on Multimedia}, pages={4857--4866}, year={2022} }

提供机构：

maas

创建时间：

2025-06-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集