tabfquad_test_subsampled
收藏魔搭社区2025-12-05 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/vidore/tabfquad_test_subsampled
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Description
TabFQuAD (Table French Question Answering Dataset) is designed to evaluate TableQA models in realistic industry settings. Using a vision language model (GPT4V), we create additional queries to augment the existing human-annotated ones.
### Data Curation
To ensure homogeneity across our benchmarked datasets, we subsampled the original test set to 280 pairs, leaving the rest for training and renaming the different columns.
### Load the dataset
```python
from datasets import load_dataset
ds = load_dataset("vidore/tabfquad_test_subsampled", split="test")
```
### Dataset Structure
Here is an example of a dataset instance structure:
```json
features:
- name: query
dtype: string
- name: image_filename
dtype: string
- name: generated_by
dtype: string
- name: GPT4 caption
dtype: string
- name: image
dtype: image
- name: source
dtype: string
```
# 数据集描述
TabFQuAD(表格法语问答数据集,Table French Question Answering Dataset)旨在于真实工业场景中评估表格问答(TableQA)模型。我们借助视觉语言模型(GPT4V)生成额外查询语句,以扩充现有人工标注的查询集合。
### 数据构建流程
为确保本次基准测试所用数据集的同质性,我们将原始测试集下采样至280条样本对,剩余样本留作训练使用,并对各列进行了重命名。
### 数据集加载
加载该数据集的Python代码示例如下:
python
from datasets import load_dataset
ds = load_dataset("vidore/tabfquad_test_subsampled", split="test")
### 数据集结构
以下为数据集实例结构示例:
json
features:
- name: query
dtype: string
- name: image_filename
dtype: string
- name: generated_by
dtype: string
- name: GPT4 caption
dtype: string
- name: image
dtype: image
- name: source
dtype: string
提供机构:
maas
创建时间:
2025-06-04



