nyu-visionx/VSI-Bench

Name: nyu-visionx/VSI-Bench
Creator: nyu-visionx
Published: 2025-11-11 00:09:48
License: 暂无描述

Hugging Face2025-11-11 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/nyu-visionx/VSI-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

VSI-Bench是一个用于定量评估多模态大型语言模型（MLLMs）在处理自我中心视频时视觉空间智能的基准数据集。该数据集包含超过5000个问题-答案对，源自288个真实视频，这些视频来自于ScanNet、ScanNet++和ARKitScenes三个公开的室内3D场景重建数据集的验证集，并覆盖了多种环境，包括住宅空间、专业设置（例如办公室、实验室）和工业空间（例如工厂），以及多个地理区域。数据集利用了这些现有3D重建和理解数据集的精确对象级注释，这些注释用于问题生成，并可能支持未来的研究探索MLLMs与3D重建之间的联系。

VSI-Bench is a benchmark dataset for quantitatively evaluating the visual-spatial intelligence of Multimodal Large Language Models (MLLMs) from egocentric video. The dataset consists of over 5,000 question-answer pairs derived from 288 real videos, sourced from the validation sets of the public indoor 3D scene reconstruction datasets ScanNet, ScanNet++, and ARKitScenes, covering diverse environments including residential spaces, professional settings (e.g., offices, labs), industrial spaces (e.g., factories), and multiple geographic regions. The dataset benefits from accurate object-level annotations from these existing 3D reconstruction and understanding datasets, which are used in question generation and could support future studies exploring the connection between MLLMs and 3D reconstruction.

提供机构：

nyu-visionx

5,000+

优质数据集

54 个

任务类型

进入经典数据集