H2VU-Benchmark
收藏arXiv2025-05-28 更新2025-11-14 收录
下载链接:
https://github.com/OPPO-AI-Center/H2VU-Benchmark
下载链接
链接失效反馈官方服务:
资源简介:
H2VU-Benchmark是由OPPO人工智能中心构建的层次化全景视频理解基准数据集,涵盖从3秒短视频到1.5小时长视频的完整时间谱系。该数据集包含10,183个评估任务,覆盖离线通用视频和在线流媒体视频两大场景,整合了传统感知推理与创新的反常识理解、状态轨迹追踪等47种核心能力维度。通过光学流动态筛选和对话内容识别等三重质量管控流程构建,该基准旨在解决现有视频理解模型在长时序依赖、动态场景适应和第一视角流媒体处理方面的核心挑战,为多模态大语言模型提供全面性能评估框架。
H2VU-Benchmark is a hierarchical panoramic video understanding benchmark dataset constructed by the OPPO AI Center, which covers a complete temporal spectrum ranging from 3-second short videos to 1.5-hour long videos. This dataset contains 10,183 evaluation tasks, covering two main scenarios: offline general-purpose videos and online streaming videos, and integrates 47 core capability dimensions including traditional perceptual reasoning, innovative counter-intuitive understanding, state trajectory tracking and others. Built via three-tier quality control processes such as dynamic optical flow filtering and dialogue content recognition, this benchmark aims to address the core challenges of existing video understanding models in long-term temporal dependencies, dynamic scene adaptation and first-person streaming processing, providing a comprehensive performance evaluation framework for multimodal large language models (LLMs).
提供机构:
OPPO人工智能中心
创建时间:
2025-03-31



