OmniSpatial
收藏魔搭社区2025-09-20 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/Virgo-Internal/OmniSpatial
下载链接
链接失效反馈官方服务:
资源简介:
# OmniSpatial
This repository contains the data presented in [OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models](https://huggingface.co/papers/2506.03135).
## Task Schema Documentation
This document provides a structured explanation of the task schema for the visual-spatial reasoning benchmark.
---
## Schema Structure
The schema is represented in JSON format, containing the following key components:
| Key | Description |
| --------------------- | ------------------------------------------------------------ |
| **id** | Identifier for the question, formatted as `{image_number}_{question_number}`. |
| **question** | The prompt or query that needs to be answered based on visual-spatial reasoning. |
| **options** | A list of possible answer choices for the question. |
| **answer** | The index of the correct answer (Ground Truth, GT) within the `options` list. |
| **task_type** | The main category of the reasoning task, with four types: |
| | - `Dynamic_Reasoning`: Analyzing motion or changes over time. |
| | - `Spatial_Interaction`: Understanding spatial relationships and object interactions. |
| | - `Complex_Logic`: Multi-step logical reasoning involving spatial or interactive elements. |
| | - `Perspective_Taking`: Reasoning about the scene from different viewpoints or observer positions. |
| **sub_task_type** | A more specific categorization of the task, for example, `Motion_Analysis` under `Dynamic_Reasoning`. |
| **sub_sub_task_type** | An additional layer of task categorization, currently not provided but planned for future updates. |
---
## Example
Below is an example schema instance:
```json
{
"id": "15_1",
"question": "If the giraffe on the right reaches the camera in 4 s, what is its speed?",
"options": [
"10.9m/s",
"0.9m/s",
"35.7m/s",
"14.7m/s"
],
"answer": 1,
"task_type": "Dynamic_Reasoning",
"sub_task_type": "Motion_Analysis"
}
```
Project Page: https://qizekun.github.io/omnispatial/
Github: https://github.com/qizekun/OmniSpatial
# OmniSpatial
本仓库收录了论文《OmniSpatial: 面向视觉语言模型(Vision Language Models)的通用空间推理基准》(https://huggingface.co/papers/2506.03135)中呈现的数据集。
## 任务模式文档
本文档对该视觉空间推理基准的任务模式进行结构化说明。
---
## 架构结构
该模式以JSON格式表示,包含以下核心组件:
| 键名 | 描述 |
| --------------------- | ------------------------------------------------------------ |
| **id** | 问题标识符,格式为 `{image_number}_{question_number}`。 |
| **question** | 需基于视觉空间推理完成作答的提示或查询语句。 |
| **options** | 该问题的所有可选答案列表。 |
| **answer** | 正确答案(即基准真值 Ground Truth, GT)在 `options` 列表中的索引。 |
| **task_type** | 推理任务的主类别,共包含四种类型: |
| | - `Dynamic_Reasoning`:分析随时间变化的运动或状态。 |
| | - `Spatial_Interaction`:理解空间关系与物体交互行为。 |
| | - `Complex_Logic`:涉及空间或交互要素的多步逻辑推理。 |
| | - `Perspective_Taking`:从不同视角或观察者位置出发对场景进行推理。 |
| **sub_task_type** | 对任务进行的更细粒度分类,例如`Dynamic_Reasoning`类别下的`Motion_Analysis`。 |
| **sub_sub_task_type** | 任务分类的额外层级,目前尚未提供,计划在未来更新中补充。 |
---
## 示例
以下为一个示例架构实例:
json
{
"id": "15_1",
"question": "If the giraffe on the right reaches the camera in 4 s, what is its speed?",
"options": [
"10.9m/s",
"0.9m/s",
"35.7m/s",
"14.7m/s"
],
"answer": 1,
"task_type": "Dynamic_Reasoning",
"sub_task_type": "Motion_Analysis"
}
项目主页:https://qizekun.github.io/omnispatial/
GitHub仓库:https://github.com/qizekun/OmniSpatial
提供机构:
maas
创建时间:
2025-08-19



