five

ShotBench

收藏
魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/Vchitect/ShotBench
下载链接
链接失效反馈
官方服务:
资源简介:
# ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models This is the official test set of ShotBench, comprising 3,572 question-answer pairs. Each sample is paired with either an image or a video clip. In total, ShotBench includes 3,049 images and 464 videos, primarily sourced from films that received Oscar nominations for Best Cinematography, ensuring high visual quality and strong cinematic style. - **Paper:** [ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models](https://arxiv.org/abs/2506.21356) - **Project Page:** [https://vchitect.github.io/ShotBench-project/](https://vchitect.github.io/ShotBench-project/) - **Code:** [https://github.com/Vchitect/ShotBench](https://github.com/Vchitect/ShotBench) ## Overview We introduce **ShotBench**, a comprehensive benchmark for evaluating VLMs’ understanding of cinematic language. It comprises over 3.5 k expert-annotated QA pairs derived from images and video clips of over 200 critically acclaimed films (predominantly Oscar-nominated), covering eight distinct cinematography dimensions. This provides a rigorous new standard for assessing fine-grained visual comprehension in film. We conducted an extensive evaluation of 24 leading VLMs, including prominent open-source and proprietary models, on ShotBench. Our results reveal a critical performance gap: even the most capable model, GPT-4o, achieves less than 60 % average accuracy. This systematically quantifies the current limitations of VLMs in genuine cinematographic comprehension. To address the identified limitations and facilitate future research, we constructed **ShotQA**, the first large-scale multimodal dataset for cinematography understanding, containing approximately 70 k high-quality QA pairs. Leveraging ShotQA, we developed **ShotVL**, a novel VLM trained using Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO). ShotVL significantly surpasses all tested open-source and proprietary models, establishing a new **state-of-the-art** on ShotBench. ## Citation If you find ShotBench useful for your research, please cite the following paper: ```bibtex @misc{ liu2025shotbench, title={ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models}, author={Hongbo Liu and Jingwen He and Yi Jin and Dian Zheng and Yuhao Dong and Fan Zhang and Ziqi Huang and Yinan He and Yangguang Li and Weichao Chen and Yu Qiao and Wanli Ouyang and Shengjie Zhao and Ziwei Liu}, year={2025}, eprint={2506.21356}, achivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2506.21356}, }

# ShotBench:视觉语言模型的专家级电影摄影理解评测基准 本数据集为ShotBench官方评测集,共包含3572组问答(QA)对。每一组样本均匹配单张图像或一段视频片段。整体而言,ShotBench涵盖3049张图像与464段视频,其素材主要源自获得奥斯卡最佳摄影奖提名的影片,从而保障了极高的视觉质量与鲜明的电影摄影风格特征。 - **论文**:[ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models](https://arxiv.org/abs/2506.21356) - **项目主页**:[https://vchitect.github.io/ShotBench-project/](https://vchitect.github.io/ShotBench-project/) - **代码仓库**:[https://github.com/Vchitect/ShotBench](https://github.com/Vchitect/ShotBench) ## 概述 我们提出**ShotBench**——一款用于评测视觉语言模型(Vision-Language Models, VLMs)电影语言理解能力的综合性评测基准。该基准包含超过3500组经专家标注的问答对,素材取自200余部广受好评的影片(绝大多数为奥斯卡提名影片)的图像与视频片段,覆盖8个独立的电影摄影维度,为评测影片细粒度视觉理解能力提供了全新的严苛标准。 我们基于ShotBench对24款主流VLMs开展了全面评测,涵盖多款知名开源与闭源模型。评测结果揭示了显著的性能差距:即便表现最优的GPT-4o模型,其平均准确率也不足60%。该结果系统性地量化了当前VLMs在真实电影摄影理解任务中的局限性。 为解决上述局限性并推动后续研究,我们构建了**ShotQA**——首个面向电影摄影理解的大规模多模态数据集,包含约70000组高质量问答对。基于ShotQA,我们开发了全新的VLMs模型**ShotVL**,该模型通过监督微调(Supervised Fine-Tuning, SFT)与群体相对策略优化(Group Relative Policy Optimization, GRPO)训练得到。ShotVL的性能显著超越所有参与评测的开源与闭源模型,在ShotBench基准上树立了全新的**当前最优性能(state-of-the-art)**。 ## 引用说明 若您的研究中用到ShotBench,请引用以下论文: bibtex @misc{ liu2025shotbench, title={ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models}, author={Hongbo Liu and Jingwen He and Yi Jin and Dian Zheng and Yuhao Dong and Fan Zhang and Ziqi Huang and Yinan He and Yangguang Li and Weichao Chen and Yu Qiao and Wanli Ouyang and Shengjie Zhao and Ziwei Liu}, year={2025}, eprint={2506.21356}, achivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2506.21356}, }
提供机构:
maas
创建时间:
2025-07-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作