Jigsaw-Puzzles
收藏arXiv2025-09-30 收录
下载链接:
https://zesen01.github.io/jigsaw-puzzles
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个全新的基准,包含了1100张精心挑选的现实世界图像,这些图像具有高空间复杂性,旨在评估视觉-语言模型的空间感知、结构理解和推理能力。此外,该数据集还包含了旨在最小化对特定领域知识依赖的任务,并突显了人类参与者和视觉-语言模型之间的性能差距。规模上,该数据集包含了1100张图像,任务专注于对视觉-语言模型的空间推理能力进行评估。
This dataset is a novel benchmark consisting of 1100 carefully curated real-world images with high spatial complexity, designed to evaluate the spatial perception, structural understanding, and reasoning capabilities of vision-language models. Additionally, the dataset includes tasks that minimize reliance on domain-specific knowledge and highlight the performance gap between human participants and vision-language models. In terms of scale, this dataset contains 1100 images, and its tasks focus on evaluating the spatial reasoning abilities of vision-language models.



