SimpleVQA

Name: SimpleVQA
Creator: maas
Published: 2026-05-16 09:10:44
License: 暂无描述

魔搭社区2026-05-16 更新2025-09-20 收录

下载链接：

https://modelscope.cn/datasets/m-a-p/SimpleVQA

下载链接

链接失效反馈

官方服务：

资源简介：

# SimpleVQA ### SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models **Dataset:** https://huggingface.co/datasets/m-a-p/SimpleVQA ## Abstract The increasing application of multi-modal large language models (MLLMs) across various sectors have spotlighted the essence of their output reliability and accuracy, particularly their ability to produce content grounded in factual information (e.g. common and domain-specific knowledge). In this work, we introduce SimpleVQA, the first comprehensive multi-modal benchmark to evaluate the factuality ability of MLLMs to answer natural language short questions. SimpleVQA is characterized by six key features: it covers multiple tasks and multiple scenarios, ensures high quality and challenging queries, maintains static and timeless reference answers, and is straightforward to evaluate. Our approach involves categorizing visual question-answering items into 9 different tasks around objective events or common knowledge and situating these within 9 topics. Rigorous quality control processes are implemented to guarantee high-quality, concise, and clear answers, facilitating evaluation with minimal variance via an LLM-as-a-judge scoring system. Using SimpleVQA, we perform a comprehensive assessment of leading 18 MLLMs and 8 text-only LLMs, delving into their image comprehension and text generation abilities by identifying and analyzing error cases. ## Dataset Building ![](images/benchmarks.png) ![image_list](images/image_list.png) ![](images/dataset_statistics.png) ## Main Results ![](images/llm_res.png) ![mllm1](images/mllm1.png) ![mllm2](images/mllm2.png) ![trace](images/trace.png) ## Citation Please consider citing this work in your publications if it helps your research. ```tex @misc{cheng2025simplevqamultimodalfactualityevaluation, title={SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models}, author={Xianfu Cheng and Wei Zhang and Shiwei Zhang and Jian Yang and Xiangyuan Guan and Xianjie Wu and Xiang Li and Ge Zhang and Jiaheng Liu and Yuying Mai and Yutao Zeng and Zhoufutu Wen and Ke Jin and Baorui Wang and Weixiao Zhou and Yunhong Lu and Tongliang Li and Wenhao Huang and Zhoujun Li}, year={2025}, eprint={2502.13059}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.13059}, } ``` ## Acknowledgements - [https://openai.com/index/introducing-simpleqa/](https://openai.com/index/introducing-simpleqa/) - [https://openstellarteam.github.io/ChineseSimpleQA/](https://openstellarteam.github.io/ChineseSimpleQA/)

# SimpleVQA ### SimpleVQA：面向多模态大语言模型的多模态事实性评估 **数据集地址：** https://huggingface.co/datasets/m-a-p/SimpleVQA ## 摘要随着多模态大语言模型（Multimodal Large Language Model，MLLM）在各领域的应用日益广泛，其输出的可靠性与准确性愈发受到关注，尤其是模型生成基于事实信息（如通用知识与领域专业知识）内容的能力。本工作提出SimpleVQA，这是首个用于评估多模态大语言模型回答自然语言短问题时的事实性能力的综合性多模态基准测试集。SimpleVQA具备六大核心特性：覆盖多任务与多场景、保证查询的高质量与挑战性、拥有静态且无时效性的参考答案，且评估流程简便易行。本研究将视觉问答条目围绕客观事件或通用知识划分为9类不同任务，并将其归入9个主题范畴。研究采用严格的质量控制流程，确保答案高质量、简洁且清晰，并通过大语言模型作为评判者（LLM-as-a-judge）的评分体系，实现低方差的便捷评估。借助SimpleVQA，研究团队对18款主流多模态大语言模型与8款纯文本大语言模型开展了全面评估，通过识别与分析错误案例，深入探究模型的图像理解与文本生成能力。 ## 数据集构建 ![](images/benchmarks.png) ![image_list](images/image_list.png) ![](images/dataset_statistics.png) ## 主要实验结果 ![](images/llm_res.png) ![mllm1](images/mllm1.png) ![mllm2](images/mllm2.png) ![trace](images/trace.png) ## 引用声明若该数据集对你的研究有所助益，请在发表成果中引用本工作。 tex @misc{cheng2025simplevqamultimodalfactualityevaluation, title={SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models}, author={Xianfu Cheng and Wei Zhang and Shiwei Zhang and Jian Yang and Xiangyuan Guan and Xianjie Wu and Xiang Li and Ge Zhang and Jiaheng Liu and Yuying Mai and Yutao Zeng and Zhoufutu Wen and Ke Jin and Baorui Wang and Weixiao Zhou and Yunhong Lu and Tongliang Li and Wenhao Huang and Zhoujun Li}, year={2025}, eprint={2502.13059}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.13059}, } ## 致谢 - [https://openai.com/index/introducing-simpleqa/](https://openai.com/index/introducing-simpleqa/) - [https://openstellarteam.github.io/ChineseSimpleQA/](https://openstellarteam.github.io/ChineseSimpleQA/)

提供机构：

maas

创建时间：

2025-08-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集