DanBenAmi/HERBench
收藏Hugging Face2025-12-18 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/DanBenAmi/HERBench
下载链接
链接失效反馈官方服务:
资源简介:
HERBench是一个用于评估视频问答中多证据整合能力的挑战性基准测试。与现有基准测试不同,HERBench要求每个问题至少需要整合三个以上时间上分离的视觉线索(k≥3)。该数据集旨在测试模型在长视频中的时间、空间和因果推理能力,确保问题无法通过单帧或有限上下文回答。HERBench提供完整版(27,936个问题,335个视频)和精简版(5,960个问题,68个视频)两个版本,以适应不同的存储和计算需求。数据集包含12种组合任务类型,涵盖时间推理、引用与跟踪、全局一致性与验证、多实体聚合与计数等多个方面。视频来源多样,包括WildTrack、HD-EPIC、PersonPath22和电影预告片等。
HERBench is a challenging benchmark designed to evaluate vision-language models on multi-evidence integration in long videos. Unlike existing benchmarks where questions can often be answered from single frames, HERBench enforces a High Evidential Requirement (ER) where each question requires aggregating at least k ≥ 3 distinct, temporally separated visual cues. The dataset tests models on temporal, spatial, and causal reasoning, ensuring questions cannot be answered from isolated frames or limited context. HERBench is available in two versions: Full (27,936 questions across 335 videos) and Lite (5,960 questions across 68 videos), catering to different storage and computational constraints. It includes 12 compositional task types spanning temporal reasoning, referring & tracking, global consistency & verification, and multi-entity aggregation & numeracy. Videos are sourced from diverse datasets including WildTrack, HD-EPIC, PersonPath22, and movie trailers.
提供机构:
DanBenAmi



