five

WorldVQA

收藏
魔搭社区2026-05-24 更新2026-02-07 收录
下载链接:
https://modelscope.cn/datasets/moonshotai/WorldVQA
下载链接
链接失效反馈
官方服务:
资源简介:
# WorldVQA ## WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models <p align="center"> <a href="https://worldvqa2026.github.io/WorldVQA/"> HomePage</a> | <a href="https://huggingface.co/datasets/moonshotai/WorldVQA"> Dataset</a> | <a href="https://arxiv.org/abs/2602.02537v1"> Paper</a> | <a href="https://github.com/MoonshotAI/WorldVQA/"> Code</a> </p> ![alt text](images/barchart.png) ## Abstract We introduce WorldVQA, a benchmark designed to evaluate the atomic vision-centric world knowledge of Multimodal Large Language Models (MLLMs). Current evaluations often conflate visual knowledge retrieval with reasoning. In contrast, WorldVQA decouples these capabilities to strictly measure "what the model memorizes." The benchmark assesses the atomic capability of grounding and naming visual entities across a stratified taxonomy, spanning from common head-class objects to long-tail rarities. We expect WorldVQA serves as a rigorous test for visual factuality, thereby establishing a standard for assessing the encyclopedic breadth and hallucination rates of current and next-generation frontier models. <img src="images/main_figure.jpg"> ## Details **WorldVQA** is a meticulously curated benchmark designed to evaluate atomic vision-centric world knowledge in Multimodal Large Language Models (MLLMs). The dataset comprises **3,000 VQA pairs** across **8 categories**, with careful attention to linguistic and cultural diversity. > **Note:** Due to copyright concerns, the "People" category has been removed from this release. The original benchmark contains 3,500 VQA pairs across 9 categories. ![alt text](images/statistics.png) ## Leaderboard Our evaluation reveals significant gaps in visual encyclopedic knowledge, with no model surpassing the 50% accuracy threshold. We show a mini-leaderboard here and please find more information in our paper or homepage. ### Overall Performance The leaderboard below shows the overall performance on WorldVQA (first 8 categories, excluding "People" due to systematic refusal behaviors in closed-source models): ![alt text](images/leaderboard.png) ## Citation If you find WorldVQA useful for your research, please cite our work: ```bibtex @misc{zhou2026worldvqameasuringatomicworld, title={WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models}, author={Runjie Zhou and Youbo Shao and Haoyu Lu and Bowei Xing and Tongtong Bai and Yujie Chen and Jie Zhao and Lin Sui and Haotian Yao and Zijia Zhao and Hao Yang and Haoning Wu and Zaida Zhou and Jinguo Zhu and Zhiqi Huang and Yiping Bao and Yangyang Liu and Y. Charles and Xinyu Zhou}, year={2026}, eprint={2602.02537}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2602.02537}, } ```

# WorldVQA ## WorldVQA:评估多模态大语言模型(Multimodal Large Language Model, MLLM)的原子化视觉世界知识 <p align="center"> <a href="https://worldvqa2026.github.io/WorldVQA/"> 主页</a> | <a href="https://huggingface.co/datasets/moonshotai/WorldVQA"> 数据集</a> | <a href="https://arxiv.org/abs/2602.02537v1"> 论文</a> | <a href="https://github.com/MoonshotAI/WorldVQA/"> 代码</a> </p> ![alt text](images/barchart.png) ## 摘要 本研究提出WorldVQA,一款专为评估多模态大语言模型(Multimodal Large Language Model, MLLM)的原子化视觉导向世界知识而打造的基准测试集。现有评估往往将视觉知识检索与推理混为一谈,而WorldVQA则将这两种能力解耦,以严格衡量模型所记忆的内容。该基准测试集通过分层分类体系,评估模型对视觉实体的定位与命名这两项原子化能力,覆盖范围从常见的头部类别物体直至长尾稀有样本。我们期望WorldVQA能够成为视觉事实性的严格测试标准,进而为评估当前及下一代前沿模型的百科知识广度与幻觉率提供统一基准。 <img src="images/main_figure.jpg"> ## 详情 **WorldVQA** 是一款经过精心构建的基准测试集,用于评估多模态大语言模型(MLLM)的原子化视觉导向世界知识。该数据集包含**3000组视觉问答(Visual Question Answering, VQA)样本对**,覆盖**8个类别**,并充分兼顾语言与文化多样性。 > **注:** 受版权问题限制,本次发布版本移除了“人物”类别。原始基准测试集共包含覆盖9个类别的3500组VQA样本对。 ![alt text](images/statistics.png) ## 排行榜 我们的评估结果显示,当前模型在视觉百科知识方面存在显著短板,尚无模型能够突破50%的准确率阈值。此处仅展示精简版排行榜,完整信息请参阅我们的论文或项目主页。 ### 整体性能 如下排行榜展示了WorldVQA的整体性能(仅包含前8个类别,因闭源模型存在系统性拒绝回答行为,故移除“人物”类别): ![alt text](images/leaderboard.png) ## 引用 若您的研究中用到了WorldVQA,请引用本工作: bibtex @misc{zhou2026worldvqameasuringatomicworld, title={WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models}, author={Runjie Zhou and Youbo Shao and Haoyu Lu and Bowei Xing and Tongtong Bai and Yujie Chen and Jie Zhao and Lin Sui and Haotian Yao and Zijia Zhao and Hao Yang and Haoning Wu and Zaida Zhou and Jinguo Zhu and Zhiqi Huang and Yiping Bao and Yangyang Liu and Y. Charles and Xinyu Zhou}, year={2026}, eprint={2602.02537}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2602.02537}, }
提供机构:
maas
创建时间:
2026-01-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作