five

VideoVista: Benchmarking Diverse and Complex Video-Language Interaction for MLLMs

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/videovista-benchmarking-diverse-and-complex-video-language-interaction-mllms
下载链接
链接失效反馈
官方服务:
资源简介:
We introduce VideoVista, a comprehensive benchmark designed for evaluating the diverse and complex video-language interactive capabilities of Video-LLMs. Meanwhile, we propose an automated data generation framework to streamline the development of advanced Video-LLMs and enhance the efficiency of human annotation within the community. Specifically, we propose a structured task taxonomy to guide the development of VideoVista: 1) To assess the comprehensive capabilities of models, we collect 2,619 videos spanning over 154 domains from diverse platforms, e.g., YouTube, Bilibili, Xiaohongshu, covering content such as Science and Technology, Sports, and Entertainment. 2) To evaluate model robustness across temporal scales, the dataset includes videos ranging from one minute to over two hours in duration, challenging models in both short- and long-term video processing. 3) We introduce 8 major task categories encompassing 48 subtask types, designed to probe a wide spectrum of abilities, including object-event-whole video content understanding and prediction, English-Chinese cultural contexts, spatial and temporal reasoning, streaming question answering, and others. 
提供机构:
Yunxin Li
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作