下载链接：

https://modelscope.cn/datasets/Virgo-Internal/OmniSpatial

下载链接

链接失效反馈

官方服务：

资源简介：

# OmniSpatial This repository contains the data presented in [OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models](https://huggingface.co/papers/2506.03135). ## Task Schema Documentation This document provides a structured explanation of the task schema for the visual-spatial reasoning benchmark. --- ## Schema Structure The schema is represented in JSON format, containing the following key components: | Key | Description | | --------------------- | ------------------------------------------------------------ | | **id** | Identifier for the question, formatted as `{image_number}_{question_number}`. | | **question** | The prompt or query that needs to be answered based on visual-spatial reasoning. | | **options** | A list of possible answer choices for the question. | | **answer** | The index of the correct answer (Ground Truth, GT) within the `options` list. | | **task_type** | The main category of the reasoning task, with four types: | | | - `Dynamic_Reasoning`: Analyzing motion or changes over time. | | | - `Spatial_Interaction`: Understanding spatial relationships and object interactions. | | | - `Complex_Logic`: Multi-step logical reasoning involving spatial or interactive elements. | | | - `Perspective_Taking`: Reasoning about the scene from different viewpoints or observer positions. | | **sub_task_type** | A more specific categorization of the task, for example, `Motion_Analysis` under `Dynamic_Reasoning`. | | **sub_sub_task_type** | An additional layer of task categorization, currently not provided but planned for future updates. | --- ## Example Below is an example schema instance: ```json { "id": "15_1", "question": "If the giraffe on the right reaches the camera in 4 s, what is its speed?", "options": [ "10.9m/s", "0.9m/s", "35.7m/s", "14.7m/s" ], "answer": 1, "task_type": "Dynamic_Reasoning", "sub_task_type": "Motion_Analysis" } ``` Project Page: https://qizekun.github.io/omnispatial/ Github: https://github.com/qizekun/OmniSpatial

# OmniSpatial 本仓库收录了论文《OmniSpatial: 面向视觉语言模型（Vision Language Models）的通用空间推理基准》(https://huggingface.co/papers/2506.03135)中呈现的数据集。 ## 任务模式文档本文档对该视觉空间推理基准的任务模式进行结构化说明。 --- ## 架构结构该模式以JSON格式表示，包含以下核心组件： | 键名 | 描述 | | --------------------- | ------------------------------------------------------------ | | **id** | 问题标识符，格式为 `{image_number}_{question_number}`。 | | **question** | 需基于视觉空间推理完成作答的提示或查询语句。 | | **options** | 该问题的所有可选答案列表。 | | **answer** | 正确答案（即基准真值 Ground Truth, GT）在 `options` 列表中的索引。 | | **task_type** | 推理任务的主类别，共包含四种类型： | | | - `Dynamic_Reasoning`：分析随时间变化的运动或状态。 | | | - `Spatial_Interaction`：理解空间关系与物体交互行为。 | | | - `Complex_Logic`：涉及空间或交互要素的多步逻辑推理。 | | | - `Perspective_Taking`：从不同视角或观察者位置出发对场景进行推理。 | | **sub_task_type** | 对任务进行的更细粒度分类，例如`Dynamic_Reasoning`类别下的`Motion_Analysis`。 | | **sub_sub_task_type** | 任务分类的额外层级，目前尚未提供，计划在未来更新中补充。 | --- ## 示例以下为一个示例架构实例： json { "id": "15_1", "question": "If the giraffe on the right reaches the camera in 4 s, what is its speed?", "options": [ "10.9m/s", "0.9m/s", "35.7m/s", "14.7m/s" ], "answer": 1, "task_type": "Dynamic_Reasoning", "sub_task_type": "Motion_Analysis" } 项目主页：https://qizekun.github.io/omnispatial/ GitHub仓库：https://github.com/qizekun/OmniSpatial

应用场景：