VidComposition_Benchmark
收藏VidComposition Benchmark 数据集概述
基本信息
- 许可证: Apache-2.0
- 任务类别:
- 问答
- 多选
- 视频文本到文本
数据集描述
VidComposition 是一个专门设计用于评估多模态大语言模型(MLLMs)在视频构图理解能力上的基准测试。该数据集包含982个视频和1706个多项选择题,涵盖以下构图方面:
- 摄像机移动
- 摄像机角度
- 镜头大小
- 叙事结构
- 角色动作和情感等
数据集格式
每个数据项为一个JSON对象,结构如下: json { "video": "0SIK_5qpD70", "segment": "0SIK_5qpD70_183.3_225.5.mp4", "class": "background_perception", "question": "What is the main background in the video?", "options": { "A": "restaurant", "B": "hallway", "C": "grassland", "D": "wood" }, "id": "1cad95c1-d13a-4ef0-b1c1-f7e753b5122f" }
评估方法
评估时需提交以下格式的预测文件: json [ { "id": "1cad95c1-d13a-4ef0-b1c1-f7e753b5122f", "model_answer": "A" }, ... ]
引用
如需引用该数据集,请使用以下BibTeX条目: bibtex @article{tang2024vidcompostion, title = {VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?}, author = {Tang, Yunlong and Guo, Junjia and Hua, Hang and Liang, Susan and Feng, Mingqian and Li, Xinyang and Mao, Rui and Huang, Chao and Bi, Jing and Zhang, Zeliang and Fazli, Pooyan and Xu, Chenliang}, journal = {arXiv preprint arXiv:2411.10979}, year = {2024} }




