Sellopale/VisuLogic
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Sellopale/VisuLogic
下载链接
链接失效反馈官方服务:
资源简介:
VisuLogic是一个新设计的基准数据集,旨在评估多模态大型语言模型(MLLMs)的视觉推理能力,独立于文本推理过程。它包含精心构建的视觉推理任务,涵盖多个类别,并根据所需的推理技能分为六种类型(例如,定量推理,涉及理解和推断图像中元素数量的变化)。与现有基准不同,VisuLogic是一个具有挑战性的视觉推理基准,难以用语言描述,从而更严格地评估MLLMs的视觉推理能力。大多数模型的准确率低于30%,仅略高于25%的随机基线,远低于人类51.4%的表现,揭示了视觉推理能力的显著差距。数据集包含1,000个精心设计的问题,涵盖6个领域和24个子类别,确保任务依赖于真正的视觉推理而非语言捷径。数据集完全开源,包括评估代码、训练脚本和相关数据。
VisuLogic is a newly designed benchmark aimed at evaluating the visual reasoning capabilities of Multi-modal Large Language Models (MLLMs), independent of textual reasoning processes. It features carefully constructed visual reasoning tasks spanning multiple categories, divided into six types based on required reasoning skills (e.g., Quantitative Reasoning, which involves understanding and deducing changes in the quantity of elements in images). Unlike existing benchmarks, VisuLogic is a challenging visual reasoning benchmark that is inherently difficult to articulate using language, providing a more rigorous evaluation of the visual reasoning capabilities of MLLMs. Most models score below 30% accuracy—only slightly above the 25% random baseline and far below the 51.4% achieved by humans—revealing significant gaps in visual reasoning. The dataset includes 1,000 meticulously curated questions, spanning 6 domains and 24 subcategories, ensuring tasks rely on genuine visual reasoning rather than linguistic shortcuts. The dataset is fully open-source, including evaluation code, training scripts, and associated datasets.
提供机构:
Sellopale



