MMBench-Video
收藏魔搭社区2026-01-06 更新2024-10-12 收录
下载链接:
https://modelscope.cn/datasets/modelscope/MMBench-Video
下载链接
链接失效反馈官方服务:
资源简介:
# MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
- **Homepage:** [https://mmbench-video.github.io/](https://mmbench-video.github.io/)
- **Repository:** [https://huggingface.co/datasets/opencompass/MMBench-Video](https://huggingface.co/datasets/opencompass/MMBench-Video)
- **Paper:** [MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding](https://arxiv.org/abs/2406.14515).
## Table of Contents
- [MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding](#mmbench-video-a-long-form-multi-shot-benchmark-for-holistic-video-understanding)
- [Table of Contents](#table-of-contents)
- [Introduction](#introduction)
- [Leaderboard](#leaderboard)
- [Data](#data)
- [How to get video data](#how-to-get-video-data)
- [Citation](#citation)
- [License](#license)
## Introduction
MMBench-Video is a quantitative benchmark designed to rigorously evaluate LVLMs' proficiency in video understanding.
MMBench-Video incorporates approximately 600 web videos with rich context from YouTube, spanning 16 major categories, including News, Sports, etc., covering most video topics people watch in their daily lives. Each video ranges in duration from 30 secs to 6 mins, to accommodate the evaluation of video understanding capabilities on longer videos. The benchmark
includes roughly 2,000 original question-answer (QA) pairs, contributed by volunteers, covering a total of 26 fine-grained capabilities. And it also implement a GPT-4-based evaluation paradigm, which offers superior accuracy, consistency, and a closer alignment with human judgments.
## Leaderboard
Latest leaderboard is in our [openvlm_video_leaderboard](https://huggingface.co/spaces/opencompass/openvlm_video_leaderboard).
## Data
The dataset includes 1,998 question-answer (QA) pairs, with each QA assessing one or multiple capabilities of a vision-language model. Each question in the dataset is a question-answer questions with groundtruth.
Here is a example:
```
index: 177
video: DmUgQzu3Z4U
video_type: Food & Drink
question: Did the mint-style guy in the video drink his mouthwash?
answer: Yes, he drank it. This is very strange. Under normal circumstances we are not allowed to drink mouthwash, but this boy may be doing it to attract viewers.
dimensions: ['Counterfactual Reasoning']
video_path: ./video/DmUgQzu3Z4U.mp4
```
### How to get video data
Using this function to unwrap pkl files to get original video data.
```python
def unwrap_hf_pkl(pth, suffix='.mp4'):
base_dir = os.path.join(pth, 'video_pkl/')
target_dir = os.path.join(pth, 'video/')
pickle_files = [os.path.join(base_dir, file) for file in os.listdir(base_dir)]
pickle_files.sort()
if not os.path.exists(target_dir):
os.makedirs(target_dir, exist_ok=True)
for pickle_file in pickle_files:
with open(pickle_file, 'rb') as file:
video_data = pickle.load(file)
# For each video file in the pickle file, write its contents to a new mp4 file
for video_name, video_content in video_data.items():
output_path = os.path.join(target_dir, f'{video_name}{suffix}')
with open(output_path, 'wb') as output_file:
output_file.write(video_content)
print('The video file has been restored and stored from the pickle file.')
else:
print('The video file already exists.')
```
For full dataset evaluation, you can use [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) to use MMBench-Video with single command.
```bash
python run.py --model GPT4o --data MMBench-Video --nframe 8 --verbose
```
## Citation
```
@misc{fang2024mmbenchvideolongformmultishotbenchmark,
title={MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding},
author={Xinyu Fang and Kangrui Mao and Haodong Duan and Xiangyu Zhao and Yining Li and Dahua Lin and Kai Chen},
year={2024},
eprint={2406.14515},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.14515},
}
```
If you using VLMEvalKit for model evaluation, please cite this:
```
@misc{duan2024vlmevalkitopensourcetoolkitevaluating,
title={VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models},
author={Haodong Duan and Junming Yang and Yuxuan Qiao and Xinyu Fang and Lin Chen and Yuan Liu and Amit Agarwal and Zhe Chen and Mo Li and Yubo Ma and Hailong Sun and Xiangyu Zhao and Junbo Cui and Xiaoyi Dong and Yuhang Zang and Pan Zhang and Jiaqi Wang and Dahua Lin and Kai Chen},
year={2024},
eprint={2407.11691},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.11691},
}
```
## License
The MMBench-Video dataset is licensed under a
[Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).
# MMBench-Video:面向全景视频理解的长格式多轮评测基准
- **官网:** [https://mmbench-video.github.io/](https://mmbench-video.github.io/)
- **代码仓库:** [https://huggingface.co/datasets/opencompass/MMBench-Video](https://huggingface.co/datasets/opencompass/MMBench-Video)
- **论文:** [MMBench-Video: 面向全景视频理解的长格式多轮评测基准](https://arxiv.org/abs/2406.14515)
## 目录
- [MMBench-Video: 面向全景视频理解的长格式多轮评测基准](#mmbench-video-a-long-form-multi-shot-benchmark-for-holistic-video-understanding)
- [目录](#table-of-contents)
- [简介](#introduction)
- [评测榜单](#leaderboard)
- [数据集](#data)
- [如何获取视频数据](#how-to-get-video-data)
- [引用](#citation)
- [许可协议](#license)
## 简介
MMBench-Video 是一款量化评测基准,旨在严格评估大视觉语言模型(Large Vision-Language Model, LVLM)的视频理解能力。MMBench-Video 收录了约600段来自YouTube的高语境网络视频,涵盖新闻、体育等16个大类,覆盖了大众日常观看的绝大多数视频主题。所有视频时长介于30秒至6分钟之间,以适配长视频场景下的视频理解能力评测。该基准包含约2000组由志愿者贡献的原创问答(QA)对,覆盖总计26项细粒度能力。此外,基准还采用了基于GPT-4的评测范式,该范式具备更高的准确率与一致性,且与人类主观判断的契合度更高。
## 评测榜单
最新评测榜单可查看我们的[openvlm_video_leaderboard](https://huggingface.co/spaces/opencompass/openvlm_video_leaderboard)。
## 数据集
该数据集包含1998组问答(QA)对,每组问答用于评估视觉语言模型的一项或多项能力。数据集中的每个问题均附带标准答案。
以下为一个示例:
index: 177
video: DmUgQzu3Z4U
video_type: Food & Drink
question: Did the mint-style guy in the video drink his mouthwash?
answer: Yes, he drank it. This is very strange. Under normal circumstances we are not allowed to drink mouthwash, but this boy may be doing it to attract viewers.
dimensions: ['Counterfactual Reasoning']
video_path: ./video/DmUgQzu3Z4U.mp4
### 如何获取视频数据
使用该函数解压pkl文件以获取原始视频数据。
python
def unwrap_hf_pkl(pth, suffix='.mp4'):
base_dir = os.path.join(pth, 'video_pkl/')
target_dir = os.path.join(pth, 'video/')
pickle_files = [os.path.join(base_dir, file) for file in os.listdir(base_dir)]
pickle_files.sort()
if not os.path.exists(target_dir):
os.makedirs(target_dir, exist_ok=True)
for pickle_file in pickle_files:
with open(pickle_file, 'rb') as file:
video_data = pickle.load(file)
# 针对pkl文件中的每个视频文件,将其内容写入新的mp4文件
for video_name, video_content in video_data.items():
output_path = os.path.join(target_dir, f'{video_name}{suffix}')
with open(output_path, 'wb') as output_file:
output_file.write(video_content)
print('已从pkl文件恢复并存储视频文件。')
else:
print('视频文件已存在。')
若需开展完整的数据集评测,可借助[VLMEvalKit](https://github.com/open-compass/VLMEvalKit),通过单条命令即可调用MMBench-Video进行评测。
bash
python run.py --model GPT4o --data MMBench-Video --nframe 8 --verbose
## 引用
@misc{fang2024mmbenchvideolongformmultishotbenchmark,
title={MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding},
author={Xinyu Fang and Kangrui Mao and Haodong Duan and Xiangyu Zhao and Yining Li and Dahua Lin and Kai Chen},
year={2024},
eprint={2406.14515},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.14515},
}
若使用VLMEvalKit进行模型评测,请引用如下文献:
@misc{duan2024vlmevalkitopensourcetoolkitevaluating,
title={VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models},
author={Haodong Duan and Junming Yang and Yuxuan Qiao and Xinyu Fang and Lin Chen and Yuan Liu and Amit Agarwal and Zhe Chen and Mo Li and Yubo Ma and Hailong Sun and Xiangyu Zhao and Junbo Cui and Xiaoyi Dong and Yuhang Zang and Pan Zhang and Jiaqi Wang and Dahua Lin and Kai Chen},
year={2024},
eprint={2407.11691},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.11691},
}
## 许可协议
MMBench-Video 数据集采用[知识共享署名4.0国际许可协议](https://creativecommons.org/licenses/by/4.0/)进行授权。
提供机构:
maas
创建时间:
2024-09-24
搜集汇总
数据集介绍

背景与挑战
背景概述
MMBench-Video是一个专注于长视频多镜头理解的评估基准,包含600个时长30秒至6分钟的YouTube视频和2000个QA对,覆盖16个日常视频类别和26种细粒度能力。该数据集采用GPT-4评估范式,为大型视觉语言模型提供全面的视频理解能力测试。
以上内容由遇见数据集搜集并总结生成



