VSI-Bench 视觉空间智能基准测试集

超神经2025-01-08 更新2024-12-28 收录

下载链接：

https://hyper.ai/cn/datasets/36711

下载链接

链接失效反馈

官方服务：

资源简介：

VSI-Bench（全称为 Visual-Spatial Intelligence Benchmark）是由李飞飞、谢赛宁及其研究团队于 2024 年推出的视觉空间智能基准测试集，旨在评估多模态大型语言模型 (MLLMs) 在空间认知和理解方面的能力，相关论文成果为「Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces」。该数据集包含超过 5k 个问题-答案对，覆盖近 290 个真实室内场景视频，涉及住宅、办公室和工厂等多种环境，涵盖了物体识别、位置关系、动作预测等多个方面的问题。这种多样化的数据结构不仅有助于训练更加鲁棒的模型，也为开发者们提供了丰富的资源用于算法验证和优化。

VSI-Bench (full name: Visual-Spatial Intelligence Benchmark) was launched in 2024 by Fei-Fei Li, Saining Xie and their research team. It is a visual-spatial intelligence benchmark designed to evaluate the spatial cognition and understanding capabilities of multimodal large language models (MLLMs). The associated research paper is titled "Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces". This dataset contains over 5,000 question-answer pairs, covering nearly 290 real indoor scene videos across various environments such as residences, offices and factories, and involves multiple types of questions including object recognition, positional relationships, action prediction and more. This diverse data structure not only facilitates the training of more robust models, but also provides developers with abundant resources for algorithm validation and optimization.

创建时间：

2024-12-24

搜集汇总

数据集介绍

背景与挑战

背景概述

VSI-Bench是由李飞飞、谢赛宁团队开发的视觉空间智能基准测试集，包含5k个问题-答案对和290个室内场景视频，用于评估多模态大型语言模型的空间认知能力。数据集覆盖多种环境，适用于算法验证和优化。

以上内容由遇见数据集搜集并总结生成