ACVUBench
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/lark-png/ACVUBench
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为ACVUBench,是一个以音频为中心的视频理解基准,包含了来自18个不同领域的2,662个视频,以及超过13,000个人工标注或验证的高质量问题-答案对。该数据集旨在评估多模态大型语言模型在关注听觉信息情况下的视频理解能力。此外,该数据集不仅测试音频内容的理解,还涵盖了音频与视觉互动的测试,提供了一系列以音频为中心的任务。规模上,数据集包含了2,662个视频和超过13,000个问题-答案对,任务重点在于视频理解和理解任务,尤其是听觉信息的处理。
This dataset, named ACVUBench, is an audio-centric video understanding benchmark. It contains 2,662 videos across 18 distinct domains, alongside over 13,000 high-quality manually annotated or verified question-answer pairs. The core objective of this benchmark is to evaluate the video understanding capabilities of multimodal large language models with a focus on auditory information. Additionally, it not only tests the comprehension of standalone audio content but also includes assessments of audio-visual interactions, offering a suite of audio-centric tasks. In terms of scale, the dataset comprises 2,662 videos and over 13,000 question-answer pairs, with its tasks centering on video understanding, particularly the processing of auditory information.
提供机构:
ACVUBench team



