five

FVQA

收藏
魔搭社区2026-01-06 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/lmms-lab/FVQA
下载链接
链接失效反馈
官方服务:
资源简介:
## Factual Visual Question Answering (FVQA) ### Dataset Summary FactualVQA (FVQA) is a multimodal Visual Question Answering dataset created for search-augmented training and evaluation. It emphasizes knowledge-intensive questions that require external information beyond the given image. Each entry includes an image, a question, and an answer (optionally accompanied by candidate answers), enabling models to develop and refine on-demand search strategies. Details of dataset construction are provided in the the [blog](https://www.lmms-lab.com/posts/mmsearch_r1/) or the [paper](https://arxiv.org/abs/2506.20670). ### Dataset Structure - Data Fields The datasets are stored in Parquet format and include the following columns: - `data_id`: unique data id - `prompt`: The user question - `images`: Raw image data stored as bytes - `reward_model`: Ground truth and candidate answers used for reward calculation - `data_source`: Specifies which reward function to use in veRL (e.g., mmsearch_r1/fvqa_train, mmsearch_r1/fvqa_test) - `image_urls`: Optional field for potential use with the image search tool - `category`: search-required and search-free annotations - Train/Test Split - Train: ~5k samples, with approximately 68% search-required and 32% search-free (estimated using Qwen2.5-VL-7B-Instruct) - Test: ~1.8k samples - Source - Image Sources: Google Image Search, subset of [InfoSeek](https://github.com/open-vision-language/infoseek)'s training split - QA Sources: GPT4o-generated, Human-annotated(for test split), subset of InfoSeek's training split - Cached Image Search Results of FVQA - Cached Image Search Results (relevant webpage titles and thumbnail-image-urls) of images of FVQA dataset, indexed by `data_id` - fvqa_train_image_search_results_cache.pkl - fvqa_test_image_search_results_cache.pkl - Since the webpage thumbnail URLs from SerpAPI’s search results include both strings and `PIL.Image` objects (e.g., `<class 'PIL.JpegImagePlugin.JpegImageFile'>`), you may need to `pip3 install pillow==11.1.0` to load the pickle files ### Citation ``` @article{wu2025mmsearch, title={MMSearch-R1: Incentivizing LMMs to Search}, author={Wu, Jinming and Deng, Zihao and Li, Wei and Liu, Yiding and You, Bo and Li, Bo and Ma, Zejun and Liu, Ziwei}, journal={arXiv preprint arXiv:2506.20670}, year={2025} } ```

# 事实视觉问答(Factual Visual Question Answering,FVQA) ## 数据集概述 事实视觉问答(FVQA)是专为搜索增强型训练与评估打造的多模态视觉问答数据集,其核心聚焦于依赖给定图像之外外部信息的知识密集型问题。每一条数据样本均包含图像、问题与答案(可附带候选答案),可支持模型开发并优化按需搜索策略。数据集构建细节可参阅[博客](https://www.lmms-lab.com/posts/mmsearch_r1/)或[论文](https://arxiv.org/abs/2506.20670)。 ## 数据集结构 - 数据字段 数据集以Parquet格式存储,包含以下列: - `data_id`:唯一数据标识符 - `prompt`:用户提问内容 - `images`:以字节形式存储的原始图像数据 - `reward_model`:用于奖励计算的标准答案与候选答案 - `data_source`:指定veRL中使用的奖励函数(例如mmsearch_r1/fvqa_train、mmsearch_r1/fvqa_test) - `image_urls`:可用于图像搜索工具的可选字段 - `category`:搜索依赖型与非搜索依赖型标注信息 - 训练集与测试集划分 - 训练集:约5000条样本,其中约68%为搜索依赖型问题,32%为非搜索依赖型问题(基于Qwen2.5-VL-7B-Instruct模型估算) - 测试集:约1800条样本 - 数据来源 - 图像来源:谷歌图像搜索,取自[InfoSeek](https://github.com/open-vision-language/infoseek)训练子集 - 问答对来源:由GPT-4o生成,测试集经人工标注,取自InfoSeek训练子集 - FVQA缓存图像搜索结果 FVQA数据集图像的缓存图像搜索结果(包含相关网页标题与缩略图URL),以`data_id`为索引: - fvqa_train_image_search_results_cache.pkl - fvqa_test_image_search_results_cache.pkl 由于SerpAPI搜索结果中的网页缩略图URL同时包含字符串与`PIL.Image`对象(例如`<class 'PIL.JpegImagePlugin.JpegImageFile'>`),因此需执行`pip3 install pillow==11.1.0`以加载该pickle文件 ## 引用 @article{wu2025mmsearch, title={MMSearch-R1: Incentivizing LMMs to Search}, author={Wu, Jinming and Deng, Zihao and Li, Wei and Liu, Yiding and You, Bo and Li, Bo and Ma, Zejun and Liu, Ziwei}, journal={arXiv preprint arXiv:2506.20670}, year={2025} }
提供机构:
maas
创建时间:
2025-07-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作