VideoVista: Benchmarking Diverse and Complex Video-Language Interaction for MLLMs

Name: VideoVista: Benchmarking Diverse and Complex Video-Language Interaction for MLLMs
Creator: Yunxin Li
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/videovista-benchmarking-diverse-and-complex-video-language-interaction-mllms

下载链接

链接失效反馈

官方服务：

资源简介：

We introduce VideoVista, a comprehensive benchmark designed for evaluating the diverse and complex video-language interactive capabilities of Video-LLMs. Meanwhile, we propose an automated data generation framework to streamline the development of advanced Video-LLMs and enhance the efficiency of human annotation within the community. Specifically, we propose a structured task taxonomy to guide the development of VideoVista: 1) To assess the comprehensive capabilities of models, we collect 2,619 videos spanning over 154 domains from diverse platforms, e.g., YouTube, Bilibili, Xiaohongshu, covering content such as Science and Technology, Sports, and Entertainment. 2) To evaluate model robustness across temporal scales, the dataset includes videos ranging from one minute to over two hours in duration, challenging models in both short- and long-term video processing. 3) We introduce 8 major task categories encompassing 48 subtask types, designed to probe a wide spectrum of abilities, including object-event-whole video content understanding and prediction, English-Chinese cultural contexts, spatial and temporal reasoning, streaming question answering, and others. 

提供机构：

Yunxin Li

5,000+

优质数据集

54 个

任务类型

进入经典数据集