FVQA
收藏魔搭社区2026-01-06 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/lmms-lab/FVQA
下载链接
链接失效反馈官方服务:
资源简介:
## Factual Visual Question Answering (FVQA)
### Dataset Summary
FactualVQA (FVQA) is a multimodal Visual Question Answering dataset created for search-augmented training and evaluation. It emphasizes knowledge-intensive questions that require external information beyond the given image. Each entry includes an image, a question, and an answer (optionally accompanied by candidate answers), enabling models to develop and refine on-demand search strategies. Details of dataset construction are provided in the the [blog](https://www.lmms-lab.com/posts/mmsearch_r1/) or the [paper](https://arxiv.org/abs/2506.20670).
### Dataset Structure
- Data Fields
The datasets are stored in Parquet format and include the following columns:
- `data_id`: unique data id
- `prompt`: The user question
- `images`: Raw image data stored as bytes
- `reward_model`: Ground truth and candidate answers used for reward calculation
- `data_source`: Specifies which reward function to use in veRL (e.g., mmsearch_r1/fvqa_train, mmsearch_r1/fvqa_test)
- `image_urls`: Optional field for potential use with the image search tool
- `category`: search-required and search-free annotations
- Train/Test Split
- Train: ~5k samples, with approximately 68% search-required and 32% search-free (estimated using Qwen2.5-VL-7B-Instruct)
- Test: ~1.8k samples
- Source
- Image Sources: Google Image Search, subset of [InfoSeek](https://github.com/open-vision-language/infoseek)'s training split
- QA Sources: GPT4o-generated, Human-annotated(for test split), subset of InfoSeek's training split
- Cached Image Search Results of FVQA
- Cached Image Search Results (relevant webpage titles and thumbnail-image-urls) of images of FVQA dataset, indexed by `data_id`
- fvqa_train_image_search_results_cache.pkl
- fvqa_test_image_search_results_cache.pkl
- Since the webpage thumbnail URLs from SerpAPI’s search results include both strings and `PIL.Image` objects (e.g., `<class 'PIL.JpegImagePlugin.JpegImageFile'>`), you may need to `pip3 install pillow==11.1.0` to load the pickle files
### Citation
```
@article{wu2025mmsearch,
title={MMSearch-R1: Incentivizing LMMs to Search},
author={Wu, Jinming and Deng, Zihao and Li, Wei and Liu, Yiding and You, Bo and Li, Bo and Ma, Zejun and Liu, Ziwei},
journal={arXiv preprint arXiv:2506.20670},
year={2025}
}
```
# 事实视觉问答(Factual Visual Question Answering,FVQA)
## 数据集概述
事实视觉问答(FVQA)是专为搜索增强型训练与评估打造的多模态视觉问答数据集,其核心聚焦于依赖给定图像之外外部信息的知识密集型问题。每一条数据样本均包含图像、问题与答案(可附带候选答案),可支持模型开发并优化按需搜索策略。数据集构建细节可参阅[博客](https://www.lmms-lab.com/posts/mmsearch_r1/)或[论文](https://arxiv.org/abs/2506.20670)。
## 数据集结构
- 数据字段
数据集以Parquet格式存储,包含以下列:
- `data_id`:唯一数据标识符
- `prompt`:用户提问内容
- `images`:以字节形式存储的原始图像数据
- `reward_model`:用于奖励计算的标准答案与候选答案
- `data_source`:指定veRL中使用的奖励函数(例如mmsearch_r1/fvqa_train、mmsearch_r1/fvqa_test)
- `image_urls`:可用于图像搜索工具的可选字段
- `category`:搜索依赖型与非搜索依赖型标注信息
- 训练集与测试集划分
- 训练集:约5000条样本,其中约68%为搜索依赖型问题,32%为非搜索依赖型问题(基于Qwen2.5-VL-7B-Instruct模型估算)
- 测试集:约1800条样本
- 数据来源
- 图像来源:谷歌图像搜索,取自[InfoSeek](https://github.com/open-vision-language/infoseek)训练子集
- 问答对来源:由GPT-4o生成,测试集经人工标注,取自InfoSeek训练子集
- FVQA缓存图像搜索结果
FVQA数据集图像的缓存图像搜索结果(包含相关网页标题与缩略图URL),以`data_id`为索引:
- fvqa_train_image_search_results_cache.pkl
- fvqa_test_image_search_results_cache.pkl
由于SerpAPI搜索结果中的网页缩略图URL同时包含字符串与`PIL.Image`对象(例如`<class 'PIL.JpegImagePlugin.JpegImageFile'>`),因此需执行`pip3 install pillow==11.1.0`以加载该pickle文件
## 引用
@article{wu2025mmsearch,
title={MMSearch-R1: Incentivizing LMMs to Search},
author={Wu, Jinming and Deng, Zihao and Li, Wei and Liu, Yiding and You, Bo and Li, Bo and Ma, Zejun and Liu, Ziwei},
journal={arXiv preprint arXiv:2506.20670},
year={2025}
}
提供机构:
maas
创建时间:
2025-07-31



