sashaaadaaance/SIBench

Name: sashaaadaaance/SIBench
Creator: sashaaadaaance
Published: 2026-03-18 13:28:18
License: 暂无描述

Hugging Face2026-03-18 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/sashaaadaaance/SIBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit --- # How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective # About SIBench At present, there already exist numerous open-source benchmarks for visual-spatial reasoning; however, each benchmark typically covers only a subset of tasks. We collected, categorized, and filtered them to construct **SIBench**. ![teaser](radar2.6_calibri.png) ## 💡 Key Features 1. **Hierarchical Evaluation** We categorize Visual Spatial Reasoning tasks into three types based on a reasoning levels: **Foundational Perception**, **Spatial Understanding**, and **Planning**. Furthermore, each category contains a rich set of evaluation tasks to comprehensively assess the visuospatial reasoning capabilities of existing VLMs. 2. **Comprehensive evaluation** The evaluation data in SIBench cover diverse input formats, including **single images**, **multi-view images**, and **videos**, as well as various question formats, such as true/false judgment, multiple-choice, and numerical question answering. The data are derived from **23** relevant tasks across nearly **20** open-source benchmarks. 3. **High Quality** SIBench prioritizes datasets with human annotations, filters out excessively long videos to avoid unreasonable task settings, and adds timestamps to videos requiring temporal information, thereby ensuring high data quality. ## 👨‍💻 Code We offer a comprehensive evaluation methodology. For more details, please refer to our evaluation [code](https://github.com/song2yu/SIBench-VSR) and [project page](https://sibench.github.io/Awesome-Visual-Spatial-Reasoning/). ## 📊 Dataset SIBench contains a total of **8.8K** data points. The data formats include single images, multiple images, and videos, while the question types include true/false, multiple-choice, and numerical questions. Additionally, we provide a streamlined version for evaluation called SIBench-mini. The data for this version is randomly selected from SIBench. SIBench-mini maintains the same comprehensive task settings as the full version, but with a more uniform data distribution. ![data](cognitive_levels.png) ## 🎯 Evaluation Results We've provided a [leaderboard](https://sibench.github.io/Awesome-Visual-Spatial-Reasoning/), and we welcome you to add your evaluation results. Please feel free to contact us directly at: sduyusong@gmail.com. ![table2](table2.png) ![table1](table1.png) ## 📖 Citation If you find *SIBench* useful in your research, please consider to cite the following related papers: ``` @article{sibench2025, title = {How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective}, author = {Songsongyu, Yuxin Chen, Hao Ju, Lianjie Jia, Shaofei Huang, Rundi Cui, Yuhan Wu, Binghao Ran, Zaibin Zhang, Zhedong Zheng, Zhipeng Zhang, Yifan Wang, Lin Song, Lijun Wang, Yanwei Li, Ying Shan, Huchuan Lu}, journal = {arXiv preprint}, year = {2025} } ```

--- license: MIT --- # 视觉语言模型（Visual Language Model，VLM）离视觉空间智能尚有多远？基于基准测试的研究视角 # 关于SIBench 目前已存在大量面向视觉空间推理的开源基准测试集，但多数基准通常仅覆盖部分任务类型。我们通过收集、分类与筛选，构建了**SIBench**基准测试集。 ![teaser](radar2.6_calibri.png) ## 💡 核心特性 1. **分层式评估** 我们依据推理层级将视觉空间推理任务划分为三类：**基础感知**、**空间理解**与**规划推理**。每个类别均包含丰富的评估任务，可全面评测现有视觉语言模型的视觉空间推理能力。 2. **全方位评估** SIBench的评估数据涵盖多种输入格式，包括**单张图像**、**多视角图像**与**视频**，同时支持多种问题类型，如正误判断、多项选择与数值问答。该数据集源自近20个开源基准测试集的23项相关任务。 3. **高质量保障** SIBench优先选用带有人工标注的数据集，过滤了时长过长的视频以避免不合理的任务设定，并为需要时序信息的视频添加时间戳，从而保障数据的高质量水准。 ## 👨‍💻 代码与方法论我们提供了完整的评估流程与方法论。更多细节请查阅我们的评估[代码](https://github.com/song2yu/SIBench-VSR)与[项目主页](https://sibench.github.io/Awesome-Visual-Spatial-Reasoning/)。 ## 📊 数据集概况 SIBench总计包含**8.8K**条数据样本。数据格式涵盖单张图像、多图像组合与视频，问题类型包括正误判断题、多项选择题与数值问答题。此外，我们还提供了用于快速评估的精简版本SIBench-mini，该版本的数据从SIBench中随机抽取而来。SIBench-mini与完整版保持一致的全面任务设置，但数据分布更为均匀。 ![data](cognitive_levels.png) ## 🎯 评估结果我们已搭建了[排行榜](https://sibench.github.io/Awesome-Visual-Spatial-Reasoning/)，欢迎各位提交您的评估结果。如有任何疑问，请直接联系我们：sduyusong@gmail.com。 ![table2](table2.png) ![table1](table1.png) ## 📖 引用说明如果您在研究中使用*SIBench*，请引用以下相关论文： @article{sibench2025, title = {How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective}, author = {Songsongyu, Yuxin Chen, Hao Ju, Lianjie Jia, Shaofei Huang, Rundi Cui, Yuhan Wu, Binghao Ran, Zaibin Zhang, Zhedong Zheng, Zhipeng Zhang, Yifan Wang, Lin Song, Lijun Wang, Yanwei Li, Ying Shan, Huchuan Lu}, journal = {arXiv preprint}, year = {2025} }

提供机构：

sashaaadaaance

5,000+

优质数据集

54 个

任务类型

进入经典数据集