VisuLogic
收藏魔搭社区2026-01-07 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/evalscope/VisuLogic
下载链接
链接失效反馈官方服务:
资源简介:
# VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
**A Challenging Visual-centric Benchmark for Evaluating Multimodal Reasoning in MLLMs!**
This is the Benchmark data repo of [VisuLogic](https://visulogic-benchmark.github.io/VisuLogic).
For more details, please refer to the project page with dataset exploration and visualization tools: [https://visulogic-benchmark.github.io/VisuLogic/](https://visulogic-benchmark.github.io/VisuLogic/).
# VisuLogic Resouces
[**🌐 Homepage**](https://visulogic-benchmark.github.io/VisuLogic) | [**🏆 Leaderboard**](https://visulogic-benchmark.github.io/VisuLogic/) | [**📖 Paper**](https://arxiv.org/abs/2504.15279) | [**🤗 Benchmark**](https://huggingface.co/datasets/VisuLogic/VisuLogic) | [**🤗 Train Data**](https://huggingface.co/datasets/VisuLogic/VisuLogic-Train)
[**💻 Eval Code**](https://github.com/VisuLogic-Benchmark/VisuLogic-Eval) | [**💻 Train Code**](https://github.com/VisuLogic-Benchmark/VisuLogic-Train) | [**🤗 Checkpoint (7B)**](https://huggingface.co/VisuLogic/qwen2_5vl_7b_rloo_80steps_hf) | [**🤗 Checkpoint (38B)**](https://huggingface.co/VisuLogic/internvl2_5_38b_rloo_100steps_hf)
## 🔔News
- **🔥[2025-06-28] Release the [SFT data](https://huggingface.co/datasets/VisuLogic/VisuLogic-Train)! 🚀**
- **🔥[2025-04-26] [VisuLogic](https://github.com/open-compass/VLMEvalKit/pull/944) has been merged into [VLMEvalkit](https://github.com/OpenCompass/VLMEvalkit). You can evaluate your model on VisuLogic with it ! Usage see [VLMEvalkit](https://github.com/open-compass/VLMEvalKit/blob/main/docs/en/Quickstart.md) ! 🚀**
- **🔥[2025-04-22] Release the paper, training data and training code! 🚀**
- **🔥[2025-04-08] Release the benchmark and the code! 🚀**
## ✅ To-do
- [x] Release the benchmark dataset and eval code
- [x] Release training code
- [x] Release the paper
- [x] Release the training dataset
- [x] Release model ckpts
## 📖 Introduction
VisuLogic is a newly designed benchmark aimed at evaluating the visual reasoning capabilities of Multi-modal Large Language Models (MLLMs), independent of textual reasoning processes. It features carefully constructed visual reasoning tasks spanning multiple categories, divided into six types based on required reasoning skills (e.g., Quantitative Reasoning, which involves understanding and deducing changes in the quantity of elements in images). Unlike existing benchmarks, VisuLogic is a challenging visual reasoning benchmark that is inherently difficult to articulate using language, providing a more rigorous evaluation of the visual reasoning capabilities of MLLMs. Most models score below 30\% accuracy—only slightly above the 25\% random baseline and far below the 51.4\% achieved by humans—revealing significant gaps in visual reasoning.

## 🌟 Key Features
- 🚀 **Visuo-Logical Challenge**
The first benchmark to integrate **visual perception** with **logical reasoning**, enabling authentic multimodal evaluation. Most models score below **30%** accuracy—only slightly above the 25% random baseline and far below the 51.4% achieved by humans—revealing significant gaps in visual reasoning.
- 🛠️ **Rigorous Design**
Includes **1,000 meticulously curated questions**, spanning **6 domains** and **24 subcategories**, for comprehensive performance evaluation.
- 📝 **Anti-Linguistic Shortcut**
Designed to avoid linguistic reasoning, ensuring tasks rely on **genuine visual reasoning** rather than shortcuts.
- 💡 **RL Exploration**
We identify the RL technique as a promising direction for improving the visual reasoning capabilities of MLLMs. Through RL method, models reach **SOTA** in VisuLogic!
- ✅ **Fully Open-source**
We **open-source** all the evaluation code, training scripts, and datasets associated with this work to promote further research and innovation.
## 🖼️ Examples of VisuLogic

## 📊 Eval
Please refer to [VisuLogic-Eval](https://github.com/VisuLogic-Benchmark/VisuLogic-Eval.git) for eval code.
## 📦 Training
Please refer to [VisuLogic-Train](https://github.com/VisuLogic-Benchmark/VisuLogic-Train.git) for training code.
## 📩 Contact
- Weiye Xu: ustcxwy0271@mail.ustc.edu.cn
- Jiahao Wang: wjhwdscience@stu.xjtu.edu.cn
## 📜 Citation
**BibTeX:**
```bibtex
@article{xu2025visulogic,
title={VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models},
author={Xu, Weiye and Wang, Jiahao and Wang, Weiyun and Chen, Zhe and Zhou, Wengang and Yang, Aijun and Lu, Lewei and Li, Houqiang and Wang, Xiaohua and Zhu, Xizhou and Wang, Wenhai and Dai, Jifeng and Zhu, Jinguo},
journal={arXiv preprint arXiv:2504.15279},
year={2025},
url={https://arxiv.org/abs/2504.15279}
}
```
🎉 Thank you for your interest in VisuLogic! We hope this benchmark helps drive advancements in multimodal reasoning! 🚀
# VisuLogic:面向多模态大语言模型视觉推理能力评估的基准数据集
**一款面向多模态大语言模型(Multi-modal Large Language Models, MLLMs)的视觉推理评估挑战性基准数据集!**
本仓库为[VisuLogic](https://visulogic-benchmark.github.io/VisuLogic)基准数据集的官方数据仓库。
如需了解更多细节,请访问集成了数据集探索与可视化工具的项目主页:[https://visulogic-benchmark.github.io/VisuLogic/](https://visulogic-benchmark.github.io/VisuLogic/).
# VisuLogic 资源链接
[**🌐 主页**](https://visulogic-benchmark.github.io/VisuLogic) | [**🏆 排行榜**](https://visulogic-benchmark.github.io/VisuLogic/) | [**📖 论文**](https://arxiv.org/abs/2504.15279) | [**🤗 基准数据集**](https://huggingface.co/datasets/VisuLogic/VisuLogic) | [**🤗 训练数据集**](https://huggingface.co/datasets/VisuLogic/VisuLogic-Train)
[**💻 评估代码**](https://github.com/VisuLogic-Benchmark/VisuLogic-Eval) | [**💻 训练代码**](https://github.com/VisuLogic-Benchmark/VisuLogic-Train) | [**🤗 模型 Checkpoint(7B)**](https://huggingface.co/VisuLogic/qwen2_5vl_7b_rloo_80steps_hf) | [**🤗 模型 Checkpoint(38B)**](https://huggingface.co/VisuLogic/internvl2_5_38b_rloo_100steps_hf)
## 🔔 最新动态
- **🔥[2025-06-28] 发布[SFT 数据集](https://huggingface.co/datasets/VisuLogic/VisuLogic-Train)! 🚀**
- **🔥[2025-04-26] [VisuLogic](https://github.com/open-compass/VLMEvalKit/pull/944) 已集成至[VLMEvalKit](https://github.com/OpenCompass/VLMEvalkit)。开发者可通过该工具在VisuLogic上评估模型!具体使用方法请参阅[VLMEvalKit](https://github.com/open-compass/VLMEvalKit/blob/main/docs/en/Quickstart.md)!🚀**
- **🔥[2025-04-22] 发布论文、训练数据集与训练代码!🚀**
- **🔥[2025-04-08] 发布基准数据集与评估代码!🚀**
## ✅ 待完成事项
- [x] 发布基准数据集与评估代码
- [x] 发布训练代码
- [x] 发布研究论文
- [x] 发布训练数据集
- [x] 发布模型 Checkpoint
## 📖 基准介绍
VisuLogic是一款全新设计的基准数据集,旨在脱离文本推理流程,独立评估多模态大语言模型(Multi-modal Large Language Models, MLLMs)的视觉推理能力。该基准包含精心构建的多类别视觉推理任务,根据所需推理技能划分为六大类型(例如**数量推理**:即理解并推导图像中元素数量的变化规律)。与现有基准数据集不同,VisuLogic是一款极具挑战性的视觉推理基准,其任务本身难以通过纯语言进行表述,能够对MLLMs的视觉推理能力开展更为严谨的评估。当前多数模型在该基准上的准确率低于30%,仅略高于25%的随机猜测基线,远低于人类取得的51.4%准确率,这揭示了当前MLLMs在视觉推理能力上的显著短板。

## 🌟 核心特性
- 🚀 **视觉-逻辑挑战**
首个将**视觉感知**与**逻辑推理**相结合的基准数据集,可实现真实可信的多模态评估。当前多数模型在该基准上的准确率低于30%,仅略高于25%的随机猜测基线,远低于人类取得的51.4%准确率,揭示了当前MLLMs在视觉推理能力上的显著短板。
- 🛠️ **严谨的任务设计**
包含**1000道精心筛选的问题**,覆盖**6大领域**与**24个子类别**,可实现全面的性能评估。
- 📝 **规避语言捷径**
任务设计刻意避免依赖文本推理,确保评估依赖真正的视觉推理而非语言捷径。
- 💡 **强化学习探索方向**
我们发现强化学习(Reinforcement Learning, RL)技术是提升MLLMs视觉推理能力的极具潜力的方向,通过RL方法训练的模型在VisuLogic基准上取得了当前最优(State-of-the-Art, SOTA)性能!
- ✅ **完全开源**
我们开源了本研究相关的全部评估代码、训练脚本与数据集,以推动相关领域的研究与创新。
## 🖼️ VisuLogic 任务示例

## 📊 评估方法
评估代码请参阅[VisuLogic-Eval](https://github.com/VisuLogic-Benchmark/VisuLogic-Eval.git)仓库。
## 📦 训练方法
训练代码请参阅[VisuLogic-Train](https://github.com/VisuLogic-Benchmark/VisuLogic-Train.git)仓库。
## 📩 联系方式
- 徐伟业:ustcxwy0271@mail.ustc.edu.cn
- 王家浩:wjhwdscience@stu.xjtu.edu.cn
## 📜 引用格式
**BibTeX:**
bibtex
@article{xu2025visulogic,
title={VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models},
author={Xu, Weiye and Wang, Jiahao and Wang, Weiyun and Chen, Zhe and Zhou, Wengang and Yang, Aijun and Lu, Lewei and Li, Houqiang and Wang, Xiaohua and Zhu, Xizhou and Wang, Wenhai and Dai, Jifeng and Zhu, Jinguo},
journal={arXiv preprint arXiv:2504.15279},
year={2025},
url={https://arxiv.org/abs/2504.15279}
}
🎉 感谢您对VisuLogic的关注!我们期待该基准能够助力多模态推理领域的研究突破!🚀
提供机构:
maas
创建时间:
2025-10-22



