FVQA

Name: FVQA
Creator: maas
Published: 2026-01-06 16:40:37
License: 暂无描述

魔搭社区2026-01-06 更新2025-08-02 收录

下载链接：

https://modelscope.cn/datasets/lmms-lab/FVQA

下载链接

链接失效反馈

官方服务：

资源简介：

## Factual Visual Question Answering (FVQA) ### Dataset Summary FactualVQA (FVQA) is a multimodal Visual Question Answering dataset created for search-augmented training and evaluation. It emphasizes knowledge-intensive questions that require external information beyond the given image. Each entry includes an image, a question, and an answer (optionally accompanied by candidate answers), enabling models to develop and refine on-demand search strategies. Details of dataset construction are provided in the the [blog](https://www.lmms-lab.com/posts/mmsearch_r1/) or the [paper](https://arxiv.org/abs/2506.20670). ### Dataset Structure - Data Fields The datasets are stored in Parquet format and include the following columns: - `data_id`: unique data id - `prompt`: The user question - `images`: Raw image data stored as bytes - `reward_model`: Ground truth and candidate answers used for reward calculation - `data_source`: Specifies which reward function to use in veRL (e.g., mmsearch_r1/fvqa_train, mmsearch_r1/fvqa_test) - `image_urls`: Optional field for potential use with the image search tool - `category`: search-required and search-free annotations - Train/Test Split - Train: ~5k samples, with approximately 68% search-required and 32% search-free (estimated using Qwen2.5-VL-7B-Instruct) - Test: ~1.8k samples - Source - Image Sources: Google Image Search, subset of [InfoSeek](https://github.com/open-vision-language/infoseek)'s training split - QA Sources: GPT4o-generated, Human-annotated(for test split), subset of InfoSeek's training split - Cached Image Search Results of FVQA - Cached Image Search Results (relevant webpage titles and thumbnail-image-urls) of images of FVQA dataset, indexed by `data_id` - fvqa_train_image_search_results_cache.pkl - fvqa_test_image_search_results_cache.pkl - Since the webpage thumbnail URLs from SerpAPI’s search results include both strings and `PIL.Image` objects (e.g., `<class 'PIL.JpegImagePlugin.JpegImageFile'>`), you may need to `pip3 install pillow==11.1.0` to load the pickle files ### Citation ``` @article{wu2025mmsearch, title={MMSearch-R1: Incentivizing LMMs to Search}, author={Wu, Jinming and Deng, Zihao and Li, Wei and Liu, Yiding and You, Bo and Li, Bo and Ma, Zejun and Liu, Ziwei}, journal={arXiv preprint arXiv:2506.20670}, year={2025} } ```

# 事实视觉问答（Factual Visual Question Answering，FVQA） ## 数据集概述事实视觉问答（FVQA）是专为搜索增强型训练与评估打造的多模态视觉问答数据集，其核心聚焦于依赖给定图像之外外部信息的知识密集型问题。每一条数据样本均包含图像、问题与答案（可附带候选答案），可支持模型开发并优化按需搜索策略。数据集构建细节可参阅[博客](https://www.lmms-lab.com/posts/mmsearch_r1/)或[论文](https://arxiv.org/abs/2506.20670)。 ## 数据集结构 - 数据字段数据集以Parquet格式存储，包含以下列： - `data_id`：唯一数据标识符 - `prompt`：用户提问内容 - `images`：以字节形式存储的原始图像数据 - `reward_model`：用于奖励计算的标准答案与候选答案 - `data_source`：指定veRL中使用的奖励函数（例如mmsearch_r1/fvqa_train、mmsearch_r1/fvqa_test） - `image_urls`：可用于图像搜索工具的可选字段 - `category`：搜索依赖型与非搜索依赖型标注信息 - 训练集与测试集划分 - 训练集：约5000条样本，其中约68%为搜索依赖型问题，32%为非搜索依赖型问题（基于Qwen2.5-VL-7B-Instruct模型估算） - 测试集：约1800条样本 - 数据来源 - 图像来源：谷歌图像搜索，取自[InfoSeek](https://github.com/open-vision-language/infoseek)训练子集 - 问答对来源：由GPT-4o生成，测试集经人工标注，取自InfoSeek训练子集 - FVQA缓存图像搜索结果 FVQA数据集图像的缓存图像搜索结果（包含相关网页标题与缩略图URL），以`data_id`为索引： - fvqa_train_image_search_results_cache.pkl - fvqa_test_image_search_results_cache.pkl 由于SerpAPI搜索结果中的网页缩略图URL同时包含字符串与`PIL.Image`对象（例如`<class 'PIL.JpegImagePlugin.JpegImageFile'>`），因此需执行`pip3 install pillow==11.1.0`以加载该pickle文件 ## 引用 @article{wu2025mmsearch, title={MMSearch-R1: Incentivizing LMMs to Search}, author={Wu, Jinming and Deng, Zihao and Li, Wei and Liu, Yiding and You, Bo and Li, Bo and Ma, Zejun and Liu, Ziwei}, journal={arXiv preprint arXiv:2506.20670}, year={2025} }

提供机构：

maas

创建时间：

2025-07-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集