five

OmniSpatial

收藏
魔搭社区2025-09-20 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/Virgo-Internal/OmniSpatial
下载链接
链接失效反馈
官方服务:
资源简介:
# OmniSpatial This repository contains the data presented in [OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models](https://huggingface.co/papers/2506.03135). ## Task Schema Documentation This document provides a structured explanation of the task schema for the visual-spatial reasoning benchmark. --- ## Schema Structure The schema is represented in JSON format, containing the following key components: | Key | Description | | --------------------- | ------------------------------------------------------------ | | **id** | Identifier for the question, formatted as `{image_number}_{question_number}`. | | **question** | The prompt or query that needs to be answered based on visual-spatial reasoning. | | **options** | A list of possible answer choices for the question. | | **answer** | The index of the correct answer (Ground Truth, GT) within the `options` list. | | **task_type** | The main category of the reasoning task, with four types: | | | - `Dynamic_Reasoning`: Analyzing motion or changes over time. | | | - `Spatial_Interaction`: Understanding spatial relationships and object interactions. | | | - `Complex_Logic`: Multi-step logical reasoning involving spatial or interactive elements. | | | - `Perspective_Taking`: Reasoning about the scene from different viewpoints or observer positions. | | **sub_task_type** | A more specific categorization of the task, for example, `Motion_Analysis` under `Dynamic_Reasoning`. | | **sub_sub_task_type** | An additional layer of task categorization, currently not provided but planned for future updates. | --- ## Example Below is an example schema instance: ```json { "id": "15_1", "question": "If the giraffe on the right reaches the camera in 4 s, what is its speed?", "options": [ "10.9m/s", "0.9m/s", "35.7m/s", "14.7m/s" ], "answer": 1, "task_type": "Dynamic_Reasoning", "sub_task_type": "Motion_Analysis" } ``` Project Page: https://qizekun.github.io/omnispatial/ Github: https://github.com/qizekun/OmniSpatial

# OmniSpatial 本仓库收录了论文《OmniSpatial: 面向视觉语言模型(Vision Language Models)的通用空间推理基准》(https://huggingface.co/papers/2506.03135)中呈现的数据集。 ## 任务模式文档 本文档对该视觉空间推理基准的任务模式进行结构化说明。 --- ## 架构结构 该模式以JSON格式表示,包含以下核心组件: | 键名 | 描述 | | --------------------- | ------------------------------------------------------------ | | **id** | 问题标识符,格式为 `{image_number}_{question_number}`。 | | **question** | 需基于视觉空间推理完成作答的提示或查询语句。 | | **options** | 该问题的所有可选答案列表。 | | **answer** | 正确答案(即基准真值 Ground Truth, GT)在 `options` 列表中的索引。 | | **task_type** | 推理任务的主类别,共包含四种类型: | | | - `Dynamic_Reasoning`:分析随时间变化的运动或状态。 | | | - `Spatial_Interaction`:理解空间关系与物体交互行为。 | | | - `Complex_Logic`:涉及空间或交互要素的多步逻辑推理。 | | | - `Perspective_Taking`:从不同视角或观察者位置出发对场景进行推理。 | | **sub_task_type** | 对任务进行的更细粒度分类,例如`Dynamic_Reasoning`类别下的`Motion_Analysis`。 | | **sub_sub_task_type** | 任务分类的额外层级,目前尚未提供,计划在未来更新中补充。 | --- ## 示例 以下为一个示例架构实例: json { "id": "15_1", "question": "If the giraffe on the right reaches the camera in 4 s, what is its speed?", "options": [ "10.9m/s", "0.9m/s", "35.7m/s", "14.7m/s" ], "answer": 1, "task_type": "Dynamic_Reasoning", "sub_task_type": "Motion_Analysis" } 项目主页:https://qizekun.github.io/omnispatial/ GitHub仓库:https://github.com/qizekun/OmniSpatial
提供机构:
maas
创建时间:
2025-08-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作