MVBench
收藏魔搭社区2026-05-17 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/modelscope/MVBench
下载链接
链接失效反馈官方服务:
资源简介:
# MVBench
## Dataset Description
- **Repository:** [MVBench](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/mvbench.ipynb)
- **Paper:** [2311.17005](https://arxiv.org/abs/2311.17005)
- **Point of Contact:** mailto:[kunchang li](likunchang@pjlab.org.cn)
## <span style="color: red;">Important Update</span>
[18/10/2024] Due to NTU RGB+D License, 320 videos from NTU RGB+D need to be downloaded manually. Please visit [ROSE Lab](https://rose1.ntu.edu.sg/dataset/actionRecognition/) to access the data. We also provide a [list of the 320 videos](https://huggingface.co/datasets/OpenGVLab/MVBench/blob/main/video/MVBench_videos_ntu.txt) used in MVBench for your reference.

We introduce a novel static-to-dynamic method for defining temporal-related tasks. By converting static tasks into dynamic ones, we facilitate systematic generation of video tasks necessitating a wide range of temporal abilities, from perception to cognition. Guided by task definitions, we then **automatically transform public video annotations into multiple-choice QA** for task evaluation. This unique paradigm enables efficient creation of MVBench with minimal manual intervention while ensuring evaluation fairness through ground-truth video annotations and avoiding biased LLM scoring. The **20** temporal task examples are as follows.

## Evaluation
An evaluation example is provided in [mvbench.ipynb](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/mvbench.ipynb). Please follow the pipeline to prepare the evaluation code for various MLLMs.
- **Preprocess**: We preserve the raw video (high resolution, long duration, etc.) along with corresponding annotations (start, end, subtitles, etc.) for future exploration; hence, the decoding of some raw videos like Perception Test may be slow.
- **Prompt**: We explore effective system prompts to encourage better temporal reasoning in MLLM, as well as efficient answer prompts for option extraction.
## Leadrboard
While an [Online leaderboard]() is under construction, the current standings are as follows:

# MVBench
## 数据集描述
- **仓库**:[MVBench](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/mvbench.ipynb)
- **论文**:[2311.17005](https://arxiv.org/abs/2311.17005)
- **联系人**:mailto:[李昌坤](likunchang@pjlab.org.cn)
## <span style="color: red;">重要更新</span>
[2024年10月18日] 受限于NTU RGB+D许可证要求,需手动下载NTU RGB+D数据集中的320段视频。请访问[ROSE实验室](https://rose1.ntu.edu.sg/dataset/actionRecognition/)获取相关数据。我们同时提供了MVBench所使用的[320段视频清单](https://huggingface.co/datasets/OpenGVLab/MVBench/blob/main/video/MVBench_videos_ntu.txt)供参考。

我们提出了一种全新的静态转动态方法用于定义时序相关任务。通过将静态任务转化为动态任务,我们得以系统性生成覆盖从感知到认知等多元时序能力的视频任务。依托任务定义的指引,我们**自动将公开视频标注转换为多项选择题问答形式**以用于任务评估。这一独特范式能够以极低的人工干预成本高效构建MVBench,同时通过真实视频标注确保评估公平性,避免大语言模型(Large Language Model,LLM)评分带来的偏差。以下为20个时序任务示例:

## 评估
[mvbench.ipynb](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/mvbench.ipynb) 中提供了一则评估示例,请遵循该流程为各类多模态大语言模型(Multimodal Large Language Model,MLLM)准备评估代码。
- **预处理**:我们保留了原始视频(高分辨率、长时长等)及对应标注信息(起始时间、结束时间、字幕等)以供后续研究,因此部分原始视频(如Perception Test)的解码过程可能较为缓慢。
- **提示工程**:我们探索了可有效激发多模态大语言模型时序推理能力的系统提示词,以及用于高效提取答案选项的答案提示词。
## 排行榜
目前[在线排行榜]()正在搭建中,当前排名如下:

提供机构:
maas
创建时间:
2024-09-24
搜集汇总
数据集介绍

背景与挑战
背景概述
MVBench是一个专注于视频任务评估的数据集,通过静态到动态的转换方法生成20个时间相关任务,并自动将视频注释转为多选QA以确保评估公平性。数据集需手动下载部分视频,并提供预处理和提示的详细指导。
以上内容由遇见数据集搜集并总结生成



