Video-Reason/VBVR-Bench-Data

Name: Video-Reason/VBVR-Bench-Data
Creator: Video-Reason
Published: 2026-04-01 10:25:17
License: 暂无描述

Hugging Face2026-04-01 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/Video-Reason/VBVR-Bench-Data

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 language: - en tags: - video-generation pretty_name: VBVR-Bench-Data size_categories: - n<1K configs: - config_name: VBVR-Bench-Data data_files: - split: test path: VBVR-Bench.json --- # VBVR: A Very Big Video Reasoning Suite <a href="https://video-reason.com" target="_blank"> <img alt="Project Page" src="https://img.shields.io/badge/Project%20-%20Homepage-4285F4" height="20" /> </a> <a href="https://github.com/Video-Reason/VBVR-EvalKit" target="_blank"> <img alt="Code" src="https://img.shields.io/badge/Evaluation_code-VBVR_Bench-100000?style=flat-square&logo=github&logoColor=white" height="20" /> </a> <a href="https://github.com/Video-Reason/VBVR-Wan2.2" target="_blank"> <img alt="Code" src="https://img.shields.io/badge/Training_code-VBVR_Wan2.2-100000?style=flat-square&logo=github&logoColor=white" height="20" /> </a> <a href="https://github.com/Video-Reason/VBVR-DataFactory" target="_blank"> <img alt="Code" src="https://img.shields.io/badge/Data_code-VBVR_DataFactory-100000?style=flat-square&logo=github&logoColor=white" height="20" /> </a> <a href="https://huggingface.co/papers/2602.20159" target="_blank"> <img alt="arXiv" src="https://img.shields.io/badge/arXiv-VBVR-red?logo=arxiv" height="20" /> </a> <a href="https://huggingface.co/Video-Reason/VBVR-Wan2.2" target="_blank"> <img alt="Leaderboard" src="https://img.shields.io/badge/%F0%9F%A4%97%20_VBVR_Wan2.2-Model-ffc107?color=ffc107&logoColor=white" height="20" /> </a> <a href="https://huggingface.co/datasets/Video-Reason/VBVR-Dataset" target="_blank"> <img alt="Leaderboard" src="https://img.shields.io/badge/%F0%9F%A4%97%20_VBVR_Dataset-Data-ffc107?color=ffc107&logoColor=white" height="20" /> </a> <a href="https://huggingface.co/spaces/Video-Reason/VBVR-Bench-Leaderboard" target="_blank"> <img alt="Leaderboard" src="https://img.shields.io/badge/%F0%9F%A4%97%20_VBVR_Bench-Leaderboard-ffc107?color=ffc107&logoColor=white" height="20" /> </a> ## Overview Video reasoning grounds intelligence in spatiotemporally consistent visual environments that go beyond what text can naturally capture, enabling intuitive reasoning over motion, interaction, and causality. Rapid progress in video models has focused primarily on visual quality. Systematically studying video reasoning and its scaling behavior suffers from a lack of video reasoning (training) data. To address this gap, we introduce the Very Big Video Reasoning (VBVR) Dataset, an unprecedentedly large-scale resource spanning 200 curated reasoning tasks and over one million video clips—approximately three orders of magnitude larger than existing datasets. We further present VBVR-Bench, a verifiable evaluation framework that moves beyond model-based judging by incorporating rule-based, human-aligned scorers, enabling reproducible and interpretable diagnosis of video reasoning capabilities. Leveraging the VBVR suite, we conduct one of the first large-scale scaling studies of video reasoning and observe early signs of emergent generalization to unseen reasoning tasks. **Together, VBVR lays a foundation for the next stage of research in generalizable video reasoning.** ## Release Information We are pleased to release the official **VBVR-Bench** test dataset, designed for standardized and rigorous evaluation of video-based visual reasoning models. The test split is designed along with the evaluation toolkit provided by Video-Reason at [VBVR-Bench evaluation code](https://github.com/Video-Reason/VBVR-Bench). After running evaluation, you can compare your model’s performance on the public leaderboard at [VBVR-Bench Leaderboard](https://huggingface.co/spaces/Video-Reason/VBVR-Bench-Leaderboard). In this release, we present [**VBVR-Wan2.2**](https://huggingface.co/Video-Reason/VBVR-Wan2.2), [**VBVR-Dataset**](https://huggingface.co/datasets/Video-Reason/VBVR-Dataset), [**VBVR-Bench-Data**](https://huggingface.co/datasets/Video-Reason/VBVR-Bench-Data) and [**VBVR-Bench-Leaderboard**](https://huggingface.co/spaces/Video-Reason/VBVR-Bench-Leaderboard). ## Data Structure The dataset is organized by domain and task generator. For example: ```bash In-Domain_50/ G-31_directed_graph_navigation_data-generator/ 00000/ first_frame.png final_frame.png ground_truth.mp4 prompt.txt ``` Structure Description - In-Domain_50/Out-of-Domain_50: Evaluation splits indicating whether samples belong to in-domain or out-of-domain settings. - G-XXX_task-name_data-generator: A specific reasoning task category and its corresponding data generator. - 00000-00004: Individual sample instances. Each sample directory contains - first_frame.png: The initial frame of the video - final_frame.png: The final frame - ground_truth.mp4: The full video sequence - prompt.txt: The textual reasoning question or instruction ## 🖊️ Citation ```bib @article{vbvr2026, title={A Very Big Video Reasoning Suite}, author={Maijunxian Wang and Ruisi Wang and Juyi Lin and Ran Ji and Thaddäus Wiedemer and Qingying Gao and Dezhi Luo and Yaoyao Qian and Lianyu Huang and Zelong Hong and Jiahui Ge and Qianli Ma and Hang He and Yifan Zhou and Lingzi Guo and Lantao Mei and Jiachen Li and Hanwen Xing and Tianqi Zhao and Fengyuan Yu and Weihang Xiao and Yizheng Jiao and Jianheng Hou and Danyang Zhang and Pengcheng Xu and Boyang Zhong and Zehong Zhao and Gaoyun Fang and John Kitaoka and Yile Xu and Hua Xu and Kenton Blacutt and Tin Nguyen and Siyuan Song and Haoran Sun and Shaoyue Wen and Linyang He and Runming Wang and Yanzhi Wang and Mengyue Yang and Ziqiao Ma and Raphaël Millière and Freda Shi and Nuno Vasconcelos and Daniel Khashabi and Alan Yuille and Yilun Du and Ziming Liu and Bo Li and Dahua Lin and Ziwei Liu and Vikash Kumar and Yijiang Li and Lei Yang and Zhongang Cai and Hokin Deng}, journal = {arXiv preprint arXiv:2602.20159}, year = {2026} } ```

提供机构：

Video-Reason

搜集汇总

数据集介绍

构建方式

在视频推理领域，数据集的构建往往面临规模与多样性的双重挑战。VBVR-Bench-Data作为VBVR评估套件的核心组成部分，其构建过程依托于一套系统化的数据工厂流程。该流程精心设计了涵盖200个不同推理任务的生成器，通过程序化方式生成了超过一百万条视频片段，确保了数据在时空一致性与任务覆盖上的广度。每个样本均包含起始帧、结束帧、完整视频序列及对应的文本提示，这种结构化的生成方法为模型评估提供了可验证的基准。

使用方法

研究者可通过HuggingFace平台直接加载VBVR-Bench-Data数据集，并利用官方提供的评估工具包进行标准化测试。使用前需将模型输出与数据集中的真实视频序列及文本提示进行对齐，随后运行评估脚本以获取各项推理任务的性能指标。完成评估后，结果可提交至公开排行榜进行横向比较。这一流程确保了评估的严谨性与一致性，助力于推动视频推理模型在运动理解、交互分析与因果推断等核心能力上的进步。

背景与挑战

背景概述

视频推理作为人工智能领域的前沿方向，旨在通过时空一致的视觉环境实现超越文本的自然理解，从而对运动、交互及因果关系进行直观推断。然而，该领域长期面临大规模训练数据稀缺的瓶颈，制约了系统性研究与模型泛化能力的探索。为应对这一挑战，Video-Reason研究团队于2026年推出了超大规模视频推理数据集VBVR，其涵盖200项精心设计的推理任务与超过百万条视频片段，规模较现有资源提升三个数量级，为视频推理的可扩展性研究奠定了关键基础。

当前挑战

视频推理领域长期受限于数据规模不足，难以支撑模型在复杂时空逻辑与因果推断任务上的系统性评估与泛化研究。VBVR数据集构建过程中，需克服多模态数据对齐、高质量视频标注以及任务多样性设计等工程挑战，确保数据在时空一致性、语义丰富度及任务覆盖广度上的严谨性。此外，评估框架需超越传统基于模型的评判机制，引入规则驱动且与人类认知对齐的评分体系，以实现视频推理能力的可复现、可解释诊断。

常用场景

经典使用场景

在视频生成与理解领域，VBVR-Bench-Data作为一个基准测试数据集，其经典使用场景在于系统评估视频推理模型的泛化能力与任务适应性。该数据集通过精心设计的200个推理任务，覆盖了时空一致性、运动交互及因果推断等核心维度，为研究者提供了标准化的评估框架。模型在此数据集上的表现，能够直观反映其在处理复杂视频语义时的逻辑连贯性与推理深度，成为衡量视频智能进展的关键标尺。

解决学术问题

该数据集有效解决了视频推理研究中数据稀缺与评估标准不统一的学术难题。传统视频数据集多侧重于视觉质量，缺乏对推理能力的系统性考察，而VBVR-Bench-Data以百万级视频剪辑规模，构建了涵盖领域内外任务的评估体系。其引入基于规则的、与人类对齐的评分机制，实现了可复现、可解释的视频推理能力诊断，为探索视频模型的缩放规律与涌现泛化提供了坚实的数据基础。

实际应用

在实际应用层面，VBVR-Bench-Data为自动驾驶、机器人交互、智能监控等需要高阶视频理解的场景提供了能力验证平台。通过评估模型对动态环境中物体运动、事件因果及交互意图的推理准确性，能够筛选出适用于现实任务的可靠视频模型。该数据集的标准化测试流程，助力工业界优化模型部署，提升系统在复杂视觉环境中的决策鲁棒性与安全性。

数据集最近研究