five

nyu-visionx/VSI-Bench

收藏
Hugging Face2025-11-11 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/nyu-visionx/VSI-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
VSI-Bench是一个用于定量评估多模态大型语言模型(MLLMs)在处理自我中心视频时视觉空间智能的基准数据集。该数据集包含超过5000个问题-答案对,源自288个真实视频,这些视频来自于ScanNet、ScanNet++和ARKitScenes三个公开的室内3D场景重建数据集的验证集,并覆盖了多种环境,包括住宅空间、专业设置(例如办公室、实验室)和工业空间(例如工厂),以及多个地理区域。数据集利用了这些现有3D重建和理解数据集的精确对象级注释,这些注释用于问题生成,并可能支持未来的研究探索MLLMs与3D重建之间的联系。

VSI-Bench is a benchmark dataset for quantitatively evaluating the visual-spatial intelligence of Multimodal Large Language Models (MLLMs) from egocentric video. The dataset consists of over 5,000 question-answer pairs derived from 288 real videos, sourced from the validation sets of the public indoor 3D scene reconstruction datasets ScanNet, ScanNet++, and ARKitScenes, covering diverse environments including residential spaces, professional settings (e.g., offices, labs), industrial spaces (e.g., factories), and multiple geographic regions. The dataset benefits from accurate object-level annotations from these existing 3D reconstruction and understanding datasets, which are used in question generation and could support future studies exploring the connection between MLLMs and 3D reconstruction.
提供机构:
nyu-visionx
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作