Pey88/MVBench

Name: Pey88/MVBench
Creator: Pey88
Published: 2026-04-11 13:10:18
License: 暂无描述

Hugging Face2026-04-11 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Pey88/MVBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit extra_gated_prompt: >- You agree to not use the dataset to conduct experiments that cause harm to human subjects. Please note that the data in this dataset may be subject to other agreements. Before using the data, be sure to read the relevant agreements carefully to ensure compliant use. Video copyrights belong to the original video creators or platforms and are for academic research use only. task_categories: - visual-question-answering - video-classification extra_gated_fields: Name: text Company/Organization: text Country: text E-Mail: text modalities: - Video - Text configs: - config_name: action_sequence data_files: json/action_sequence.json - config_name: moving_count data_files: json/moving_count.json - config_name: action_prediction data_files: json/action_prediction.json - config_name: episodic_reasoning data_files: json/episodic_reasoning.json - config_name: action_antonym data_files: json/action_antonym.json - config_name: action_count data_files: json/action_count.json - config_name: scene_transition data_files: json/scene_transition.json - config_name: object_shuffle data_files: json/object_shuffle.json - config_name: object_existence data_files: json/object_existence.json - config_name: fine_grained_pose data_files: json/fine_grained_pose.json - config_name: unexpected_action data_files: json/unexpected_action.json - config_name: moving_direction data_files: json/moving_direction.json - config_name: state_change data_files: json/state_change.json - config_name: object_interaction data_files: json/object_interaction.json - config_name: character_order data_files: json/character_order.json - config_name: action_localization data_files: json/action_localization.json - config_name: counterfactual_inference data_files: json/counterfactual_inference.json - config_name: fine_grained_action data_files: json/fine_grained_action.json - config_name: moving_attribute data_files: json/moving_attribute.json - config_name: egocentric_navigation data_files: json/egocentric_navigation.json language: - en size_categories: - 1K<n<10K --- # MVBench ## Dataset Description - **Repository:** [MVBench](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/mvbench.ipynb) - **Paper:** [2311.17005](https://arxiv.org/abs/2311.17005) - **Point of Contact:** mailto:[kunchang li](likunchang@pjlab.org.cn) ## <span style="color: red;">Important Update</span> [18/10/2024] Due to NTU RGB+D License, 320 videos from NTU RGB+D need to be downloaded manually. Please visit [ROSE Lab](https://rose1.ntu.edu.sg/dataset/actionRecognition/) to access the data. We also provide a [list of the 320 videos](https://huggingface.co/datasets/OpenGVLab/MVBench/blob/main/video/MVBench_videos_ntu.txt) used in MVBench for your reference. ![images](./assert/generation.png) We introduce a novel static-to-dynamic method for defining temporal-related tasks. By converting static tasks into dynamic ones, we facilitate systematic generation of video tasks necessitating a wide range of temporal abilities, from perception to cognition. Guided by task definitions, we then **automatically transform public video annotations into multiple-choice QA** for task evaluation. This unique paradigm enables efficient creation of MVBench with minimal manual intervention while ensuring evaluation fairness through ground-truth video annotations and avoiding biased LLM scoring. The **20** temporal task examples are as follows. ![images](./assert/task_example.png) ## Evaluation An evaluation example is provided in [mvbench.ipynb](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/mvbench.ipynb). Please follow the pipeline to prepare the evaluation code for various MLLMs. - **Preprocess**: We preserve the raw video (high resolution, long duration, etc.) along with corresponding annotations (start, end, subtitles, etc.) for future exploration; hence, the decoding of some raw videos like Perception Test may be slow. - **Prompt**: We explore effective system prompts to encourage better temporal reasoning in MLLM, as well as efficient answer prompts for option extraction. ## Leadrboard While an [Online leaderboard]() is under construction, the current standings are as follows: ![images](./assert/leaderboard.png)

提供机构：

Pey88

搜集汇总

数据集介绍

构建方式

在视频理解领域，MVBench数据集采用了一种创新的静态到动态任务定义方法，通过将静态任务转化为动态任务，系统性地生成了涵盖从感知到认知的广泛时间能力需求的视频任务。该数据集基于公开视频标注，自动将其转化为多项选择题问答形式，用于任务评估。这一独特范式确保了数据构建的高效性，同时通过真实视频标注保障了评估的公平性，避免了大型语言模型评分可能带来的偏差。

特点

MVBench数据集以其全面的时间任务覆盖而著称，包含二十个精心设计的任务类别，如动作序列、移动计数、场景转换和反事实推理等。每个任务类别均配有独立的配置文件，支持视频与文本多模态输入，数据集规模介于一千至一万个样本之间，语言为英文。其特点在于任务定义的多样性与系统性，能够有效评估模型在复杂时间推理方面的能力。

使用方法

使用MVBench数据集时，需遵循其特定的评估流程。用户需从HuggingFace平台下载数据集，并注意部分视频因版权许可需手动从NTU RGB+D获取。评估代码示例可在项目仓库的Jupyter笔记本中找到，包括视频预处理和提示工程两个关键步骤。预处理阶段保留原始高分辨率视频及相关标注，而提示工程则探索有效的系统提示以增强时间推理。用户可通过配置不同任务文件，灵活测试多模态大语言模型在各类时间任务上的表现。

背景与挑战

背景概述

MVBench数据集由OpenGVLab团队于2023年推出，旨在系统性地评估多模态大语言模型在视频理解中的时序推理能力。该数据集创新性地采用静态到动态的任务定义方法，将二十类时序任务从感知层面延伸至认知层面，涵盖了动作序列、场景转换、反事实推理等复杂维度。通过自动转换公开视频标注为多选题形式，MVBench在减少人工干预的同时确保了评估的客观性，为视频问答与分类研究提供了标准化基准，显著推动了时序推理模型的发展。

当前挑战

MVBench所针对的核心挑战在于视频时序推理的全面评估，传统方法往往局限于单一任务如动作识别，而该数据集需同时处理从低级动作计数到高级反事实推理的跨层次能力。在构建过程中，数据集面临视频版权与数据获取的合规性难题，例如部分视频需手动从NTU RGB+D等受限源下载；此外，原始视频的高分辨率与长时长特性导致解码效率低下，且自动生成多选题时需平衡任务多样性与标注准确性，以避免引入模型评分偏差。

常用场景

经典使用场景

在视频理解与多模态大语言模型评估领域，MVBench数据集通过其独特的静态到动态任务转换方法，为模型提供了全面的时间推理能力测试平台。该数据集自动将公开视频标注转化为多项选择题，涵盖动作序列、场景转换、反事实推理等20个时间相关任务，经典使用场景包括评估模型在视频问答中对动作预测、物体交互和状态变化的感知与认知能力，从而系统化衡量模型在动态视觉内容中的理解深度。

解决学术问题

MVBench有效解决了视频理解研究中时间推理能力评估标准缺失的学术问题。传统数据集往往侧重于单一任务或有限的时间维度，而MVBench通过系统化定义从感知到认知的广泛时间任务，为多模态大语言模型提供了公平、全面的评估基准。其基于真实视频标注的自动生成机制避免了大型语言模型评分偏差，促进了视频时序理解领域的标准化进展，推动了模型在复杂动态场景中推理能力的发展。

衍生相关工作

MVBench的发布催生了一系列围绕视频时序理解的前沿研究。基于其多任务评估框架，研究者们开发了如VideoChat、VideoLLaMA等增强时序建模能力的多模态大语言模型。该数据集还启发了对模型在长视频理解、跨模态对齐等方面的深入探索，相关成果发表于CVPR、ICCV等顶级会议，推动了视频问答、动作预测等子领域的算法创新与基准迭代。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集