five

tatdqa_test

收藏
魔搭社区2026-01-06 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/vidore/tatdqa_test
下载链接
链接失效反馈
官方服务:
资源简介:
## Dataset Description This is the test set taken from the [TAT-DQA dataset](https://nextplusplus.github.io/TAT-DQA/). TAT-DQA is a large-scale Document VQA dataset that was constructed from publicly available real-world financial reports. It focuses on rich tabular and textual content requiring numerical reasoning. Questions and answers were manually annotated by human experts in finance. Example of data (see viewer) ### Data Curation Unlike other 'academic' datasets, we kept the full test set as this dataset closely represents our use case of document retrieval. There are 1,663 image-query pairs. ### Load the dataset ```python from datasets import load_dataset ds = load_dataset("vidore/tatdqa_test", split="test") ``` ### Dataset Structure Here is an example of a dataset instance structure: ```json features: - name: questionId dtype: string - name: query dtype: string - name: question_types dtype: 'null' - name: image dtype: image - name: docId dtype: int64 - name: image_filename dtype: string - name: page dtype: string - name: answer dtype: 'null' - name: data_split dtype: string - name: source dtype: string ``` ## Citation Information If you use this dataset in your research, please cite the original dataset as follows: ```latex @inproceedings{zhu-etal-2021-tat, title = "{TAT}-{QA}: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance", author = "Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.254", doi = "10.18653/v1/2021.acl-long.254", pages = "3277--3287" } @inproceedings{zhu2022towards, title={Towards complex document understanding by discrete reasoning}, author={Zhu, Fengbin and Lei, Wenqiang and Feng, Fuli and Wang, Chao and Zhang, Haozhou and Chua, Tat-Seng}, booktitle={Proceedings of the 30th ACM International Conference on Multimedia}, pages={4857--4866}, year={2022} } ```

## 数据集说明 本测试集取自[TAT-DQA数据集(TAT-DQA dataset)](https://nextplusplus.github.io/TAT-DQA/)。TAT-DQA是一个大规模文档视觉问答(Document VQA)数据集,其构建素材均来自公开可得的真实金融报告。该数据集聚焦于包含丰富表格与文本内容、需要数值推理的任务场景,问答对均由金融领域的人类专家手动标注完成。 数据示例(详见查看器) ### 数据集整理 与其他"学术类"数据集不同,我们保留了完整的测试集,因该数据集高度贴合我们的文档检索应用场景。该数据集共包含1663组图像-查询对。 ### 加载数据集 python from datasets import load_dataset ds = load_dataset("vidore/tatdqa_test", split="test") ### 数据集结构 以下为数据集实例的结构示例: json features: - name: questionId dtype: string - name: query dtype: string - name: question_types dtype: 'null' - name: image dtype: image - name: docId dtype: int64 - name: image_filename dtype: string - name: page dtype: string - name: answer dtype: 'null' - name: data_split dtype: string - name: source dtype: string ## 引用信息 若您在研究中使用该数据集,请按以下方式引用原始数据集: latex @inproceedings{zhu-etal-2021-tat, title = "{TAT}-{QA}: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance", author = "Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.254", doi = "10.18653/v1/2021.acl-long.254", pages = "3277--3287" } @inproceedings{zhu2022towards, title={Towards complex document understanding by discrete reasoning}, author={Zhu, Fengbin and Lei, Wenqiang and Feng, Fuli and Wang, Chao and Zhang, Haozhou and Chua, Tat-Seng}, booktitle={Proceedings of the 30th ACM International Conference on Multimedia}, pages={4857--4866}, year={2022} }
提供机构:
maas
创建时间:
2025-06-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作