ComTQA
收藏魔搭社区2026-01-06 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/ByteDance/ComTQA
下载链接
链接失效反馈官方服务:
资源简介:
# ComTQA Dataset
## 1. Introduction
This dataset is a visual table question answering benchmark.
The images are collected from [FinTabNet](https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/f57cf3f6-e972-48ff-ab7b-3771ba7b9683/view?access_token=317644327d84f5d75b4782f97499146c78d029651a7c7ace050f4a7656033c30) and [PubTables-1M](https://huggingface.co/datasets/bsmock/pubtables-1m).
It totally includes 9070 QA pairs with 1591 images.
The specific distribution of data is shown in the following table.
| | PubTables-1M | FinTabNet | Total |
| :-----| :----: | :----: | :----: |
| #images | 932 | 659 | 1,591 |
| #QA pairs | 6,232 | 2,838 | 9,070 |
## 2. How to use it
* First,please download the [FinTabNet](https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/f57cf3f6-e972-48ff-ab7b-3771ba7b9683/view?access_token=317644327d84f5d75b4782f97499146c78d029651a7c7ace050f4a7656033c30) and [PubTables-1M](https://huggingface.co/datasets/bsmock/pubtables-1m) from their original websites. The structure of collected data is formatted as follows,
```
root
└─FinTabNet
├─ pdf
├─ FinTabNet_1.0.0_cell_test.jsonl
├─ FinTabNet_1.0.0_cell_train.jsonl
└─ ...
└─PubTables-1M
├─ PubTables-1M-Structure
├─ images
├─ ...
├─ PubTables-1M-Detection
├─ ...
```
* Second, you can follow the steps below to extract the corresponding images.
+ For PubTables-1M, the key ''image_name'' in [annotation.json](./annotation.json) represents the filename in the "./PubTables-1M/PubTables-1M-Structure/images".
+ For FinTabNet, the key "table_id" in [annotation.json](./annotation.json) represents the same key in the file "FinTabNet_1.0.0_cell_test.jsonl". You could crop the table images from the original PDF with the annotations in "FinTabNet_1.0.0_cell_test.jsonl"
## Citation
If you find this dataset useful for your research, please consider citing our work:
```
@inproceedings{zhao2024tabpedia,
title={TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy},
author = {Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Binghong Wu, Lei Liao, Shu Wei, Yongjie Ye, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang},
booktitle = {Advances in Neural Information Processing Systems},
year = {2024}
}
```
# ComTQA 数据集
## 1. 引言
本数据集为**视觉表格问答(Visual Table Question Answering)**基准测试集。其图像采集自[FinTabNet](https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/f57cf3f6-e972-48ff-ab7b-3771ba7b9683/view?access_token=317644327d84f5d75b4782f97499146c78d029651a7c7ace050f4a7656033c30)与[PubTables-1M](https://huggingface.co/datasets/bsmock/pubtables-1m)。总计包含9070个问答(Question Answering,QA)对与1591张表格图像。数据的具体分布如下表所示:
| | PubTables-1M | FinTabNet | 总计 |
| :-----| :----: | :----: | :----: |
| #图像 | 932 | 659 | 1,591 |
| #问答对 | 6,232 | 2,838 | 9,070 |
## 2. 使用方法
* 首先,请从原始平台下载[FinTabNet](https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/f57cf3f6-e972-48ff-ab7b-3771ba7b9683/view?access_token=317644327d84f5d75b4782f97499146c78d029651a7c7ace050f4a7656033c30)与[PubTables-1M](https://huggingface.co/datasets/bsmock/pubtables-1m)。采集得到的数据结构格式如下:
root
└─FinTabNet
├─ pdf
├─ FinTabNet_1.0.0_cell_test.jsonl
├─ FinTabNet_1.0.0_cell_train.jsonl
└─ ...
└─PubTables-1M
├─ PubTables-1M-Structure
├─ images
├─ ...
├─ PubTables-1M-Detection
├─ ...
* 其次,您可按照以下步骤提取对应图像:
+ 对于PubTables-1M,[annotation.json](./annotation.json)中的键`image_name`对应`./PubTables-1M/PubTables-1M-Structure/images`目录下的文件名。
+ 对于FinTabNet,[annotation.json](./annotation.json)中的键`table_id`与`FinTabNet_1.0.0_cell_test.jsonl`文件中的同名键一致。您可借助`FinTabNet_1.0.0_cell_test.jsonl`中的标注信息从原始PDF中裁剪得到对应表格图像。
## 引用说明
若本数据集对您的研究有所帮助,请引用如下文献:
@inproceedings{zhao2024tabpedia,
title={TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy},
author = {Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Binghong Wu, Lei Liao, Shu Wei, Yongjie Ye, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang},
booktitle = {Advances in Neural Information Processing Systems},
year = {2024}
}
提供机构:
maas
创建时间:
2024-08-01



