arxivqa_test_subsampled
收藏魔搭社区2025-11-27 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/vidore/arxivqa_test_subsampled
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Description
This is a VQA dataset based on figures extracted from arXiv publications taken from ArXiVQA dataset from [Multimodal ArXiV](https://arxiv.org/abs/2403.00231). The questions were generated synthetically using GPT-4 Vision.
### Data Curation
To ensure homogeneity across our benchmarked datasets, we subsampled the original test set to 500 pairs. Furthermore we renamed the different columns for our purpose.
### Load the dataset
```python
from datasets import load_dataset
ds = load_dataset("vidore/arxivqa_test_subsampled", split="test")
```
### Dataset Structure
Here is an example of a dataset instance:
```xml
features:
- name: query
dtype: string
- name: image
dtype: image
- name: image_filename
dtype: string
- name: options
dtype: string
- name: answer
dtype: string
- name: page
dtype: string
- name: model
dtype: string
- name: prompt
dtype: string
- name: source
dtype: string
```
## Citation Information
If you use this dataset in your research, please cite the original dataset as follows:
```bibtex
@misc{li2024multimodal,
title={Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models},
author={Lei Li and Yuqi Wang and Runxin Xu and Peiyi Wang and Xiachong Feng and Lingpeng Kong and Qi Liu},
year={2024},
eprint={2403.00231},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## 数据集描述
本数据集为基于arXiv预印本论文插图构建的视觉问答(Visual Question Answering, VQA)数据集,其原始数据取自[Multimodal ArXiV](https://arxiv.org/abs/2403.00231)提出的ArXiVQA数据集。所有问题均通过GPT-4 Vision合成生成。
### 数据整理
为确保本基准数据集与其他同类数据集的一致性,我们将原始测试集下采样至500条样本对。此外,我们根据自身研究需求对数据集的不同列进行了重命名。
### 加载数据集
python
from datasets import load_dataset
ds = load_dataset("vidore/arxivqa_test_subsampled", split="test")
### 数据集结构
以下为数据集实例结构示例:
xml
特征项:
- 字段名:query,数据类型:字符串
- 字段名:image,数据类型:图像
- 字段名:image_filename,数据类型:字符串
- 字段名:options,数据类型:字符串
- 字段名:answer,数据类型:字符串
- 字段名:page,数据类型:字符串
- 字段名:model,数据类型:字符串
- 字段名:prompt,数据类型:字符串
- 字段名:source,数据类型:字符串
## 引用信息
若您在研究中使用本数据集,请按照如下格式引用原始数据集:
bibtex
@misc{li2024multimodal,
title={Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models},
author={Lei Li and Yuqi Wang and Runxin Xu and Peiyi Wang and Xiachong Feng and Lingpeng Kong and Qi Liu},
year={2024},
eprint={2403.00231},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
提供机构:
maas
创建时间:
2025-06-04



