Video-Reason/VBVR-Dataset

Name: Video-Reason/VBVR-Dataset
Creator: Video-Reason
Published: 2026-04-01 10:27:10
License: 暂无描述

Hugging Face2026-04-01 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Video-Reason/VBVR-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - video-classification - visual-question-answering - video-text-to-text language: - en tags: - video-reasoning - video-generation - visual-reasoning - benchmark - spatiotemporal - VBVR size_categories: - 1M<n<10M pretty_name: "VBVR-Dataset: Very Big Video Reasoning Training Data" dataset_info: features: - name: id dtype: int64 - name: generator dtype: string - name: task dtype: string - name: sample_id dtype: string - name: prompt dtype: string - name: metadata_json dtype: string - name: first_frame_path dtype: string - name: final_frame_path dtype: string - name: ground_truth_video_path dtype: string - name: tar_file dtype: string splits: - name: train num_examples: 1000000 configs: - config_name: default data_files: - split: train path: "data/metadata.parquet" --- # VBVR-Dataset: Very Big Video Reasoning Training Data <a href="https://video-reason.com" target="_blank"> <img alt="Project Page" src="https://img.shields.io/badge/Project%20-%20Homepage-4285F4" height="20" /> </a> <a href="https://github.com/Video-Reason/VBVR-EvalKit" target="_blank"> <img alt="Code" src="https://img.shields.io/badge/Evaluation_code-VBVR_Bench-100000?style=flat-square&logo=github&logoColor=white" height="20" /> </a> <a href="https://github.com/Video-Reason/VBVR-Wan2.2" target="_blank"> <img alt="Code" src="https://img.shields.io/badge/Training_code-VBVR_Wan2.2-100000?style=flat-square&logo=github&logoColor=white" height="20" /> </a> <a href="https://github.com/Video-Reason/VBVR-DataFactory" target="_blank"> <img alt="Code" src="https://img.shields.io/badge/Data_code-VBVR_DataFactory-100000?style=flat-square&logo=github&logoColor=white" height="20" /> </a> <a href="https://huggingface.co/papers/2602.20159" target="_blank"> <img alt="arXiv" src="https://img.shields.io/badge/arXiv-VBVR-red?logo=arxiv" height="20" /> </a> <a href="https://huggingface.co/Video-Reason/VBVR-Wan2.2" target="_blank"> <img alt="Leaderboard" src="https://img.shields.io/badge/%F0%9F%A4%97%20_VBVR_Wan2.2-Model-ffc107?color=ffc107&logoColor=white" height="20" /> </a> <a href="https://huggingface.co/datasets/Video-Reason/VBVR-Bench-Data" target="_blank"> <img alt="Bench Data" src="https://img.shields.io/badge/%F0%9F%A4%97%20_VBVR_Bench-Data-ffc107?color=ffc107&logoColor=white" height="20" /> </a> <a href="https://huggingface.co/spaces/Video-Reason/VBVR-Bench-Leaderboard" target="_blank"> <img alt="Leaderboard" src="https://img.shields.io/badge/%F0%9F%A4%97%20_VBVR_Bench-Leaderboard-ffc107?color=ffc107&logoColor=white" height="20" /> </a> ## Overview **VBVR-Dataset** is an unprecedentedly large-scale video reasoning training resource, part of the **Very Big Video Reasoning (VBVR) Suite**. This release contains the **training split**: **100 curated reasoning task generators** with **1,000,000 video clips** (10,000 samples per generator), with each sample consisting of a video, start/end frames, a textual reasoning prompt, and structured metadata. This dataset is designed to support large-scale training and scaling studies of reasoning capabilities in video generation models. ## Key Statistics | Property | Value | |---|---| | **Total samples** | 1,000,000 | | **Task generators** | 100 | | **Samples per generator** | 10,000 | | **Files per sample** | 5 (first_frame.png, final_frame.png, ground_truth.mp4, metadata.json, prompt.txt) | | **Total files** | 5,000,000 | | **Total size (compressed)** | ~370 GB (100 tar files) | | **Video format** | MP4 | | **Image format** | PNG | | **Language** | English | | **License** | Apache 2.0 | ## Dataset Structure ### Browsable Metadata The `data/metadata.parquet` file contains 1,000,000 rows with the following columns, viewable directly in the HF Dataset Viewer: | Column | Type | Description | |---|---|---| | `id` | int64 | Global unique sample index (0–999,999) | | `generator` | string | Generator name (e.g., `G-11_handle_object_reappearance_data-generator`) | | `task` | string | Task name within the generator | | `sample_id` | string | Sample identifier (e.g., `handle_object_reappearance_00000000`) | | `prompt` | string | The textual reasoning question or instruction | | `metadata_json` | string | JSON string with generation parameters, seed, and task-specific configs | | `first_frame_path` | string | Relative path to the first frame PNG within the tar | | `final_frame_path` | string | Relative path to the final frame PNG within the tar | | `ground_truth_video_path` | string | Relative path to the ground truth MP4 within the tar | | `tar_file` | string | Which tar file contains this sample (e.g., `tars/G-11_handle_object_reappearance_data-generator.tar`) | ### Tar Files The actual video/image data is stored as **100 individual tar files** in the `tars/` directory, one per generator. Each tar contains the full directory structure: ``` <generator_name>/ <task_name>/ <sample_id>/ first_frame.png # Initial frame of the video final_frame.png # Final frame of the video ground_truth.mp4 # Full video sequence (ground truth) metadata.json # Structured generation metadata prompt.txt # Textual reasoning prompt ``` ## Usage ### Browse Metadata (No Download Required) The metadata is directly viewable in the Dataset Viewer tab above. You can explore prompts, task types, and sample distributions without downloading anything. ### Load Metadata with `datasets` ```python from datasets import load_dataset ds = load_dataset("Video-Reason/VBVR-Dataset", split="train") print(f"Total samples: {len(ds)}") print(ds[0]) # View first sample metadata ``` ### Download Specific Tar Files ```python from huggingface_hub import hf_hub_download # Download a specific generator's tar tar_path = hf_hub_download( repo_id="Video-Reason/VBVR-Dataset", filename="tars/G-11_handle_object_reappearance_data-generator.tar", repo_type="dataset", ) # Extract import tarfile with tarfile.open(tar_path) as tar: tar.extractall("./data") ``` ### Download All Tar Files ```bash # Using huggingface-cli huggingface-cli download Video-Reason/VBVR-Dataset --include "tars/*.tar" --repo-type dataset --local-dir ./vbvr-data # Or using git lfs GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/Video-Reason/VBVR-Dataset cd VBVR-Dataset git lfs pull --include "tars/*.tar" ``` ## Complete List of Training Set Generators <details> <summary>Click to expand full list of 100 training set generators</summary> **Geometry & Graph (G-series, 63 generators):** G-1, G-2, G-3, G-4, G-5, G-6, G-7, G-8, G-9, G-11, G-12, G-13, G-14, G-15, G-16, G-17, G-18, G-19, G-21, G-22, G-25, G-26, G-27, G-29, G-30, G-31, G-32, G-33, G-34, G-35, G-36, G-37, G-38, G-39, G-40, G-41, G-43, G-44, G-45, G-46, G-48, G-49, G-50, G-51, G-131, G-132, G-133, G-134, G-137, G-138, G-141, G-143, G-146, G-158, G-162, G-163, G-165, G-166, G-194, G-195, G-198, G-199, G-200 **Object & Physics (O-series, 37 generators):** O-1, O-3, O-4, O-7, O-8, O-10, O-12, O-13, O-14, O-15, O-16, O-17, O-18, O-19, O-21, O-23, O-24, O-25, O-29, O-30, O-31, O-32, O-33, O-34, O-36, O-37, O-38, O-44, O-45, O-47, O-52, O-53, O-55, O-66, O-75, O-83, O-87 </details> --- ## Links - **Website**: [Video-Reason.com](https://video-reason.com/) - **Paper**: [A Very Big Video Reasoning Suite](https://arxiv.org/abs/2602.20159v1) - **Slack**: [Join our workspace](https://join.slack.com/t/video-reason/shared_invite/zt-3qqf23icm-UC29fatWWYsIuzRNBR1lgg) - **HuggingFace**: [Video-Reason](https://huggingface.co/Video-Reason) - **Contact**: [hokinxqdeng@gmail.com](mailto:hokinxqdeng@gmail.com) --- ## Citation If you use VBVR in your research, please cite: ```bibtex @article{vbvr2026, title = {A Very Big Video Reasoning Suite}, author = {Wang, Maijunxian and Wang, Ruisi and Lin, Juyi and Ji, Ran and Wiedemer, Thadd{\"a}us and Gao, Qingying and Luo, Dezhi and Qian, Yaoyao and Huang, Lianyu and Hong, Zelong and Ge, Jiahui and Ma, Qianli and He, Hang and Zhou, Yifan and Guo, Lingzi and Mei, Lantao and Li, Jiachen and Xing, Hanwen and Zhao, Tianqi and Yu, Fengyuan and Xiao, Weihang and Jiao, Yizheng and Hou, Jianheng and Zhang, Danyang and Xu, Pengcheng and Zhong, Boyang and Zhao, Zehong and Fang, Gaoyun and Kitaoka, John and Xu, Yile and Xu, Hua and Blacutt, Kenton and Nguyen, Tin and Song, Siyuan and Sun, Haoran and Wen, Shaoyue and He, Linyang and Wang, Runming and Wang, Yanzhi and Yang, Mengyue and Ma, Ziqiao and Milli{\`e}re, Rapha{\"e}l and Shi, Freda and Vasconcelos, Nuno and Khashabi, Daniel and Yuille, Alan and Du, Yilun and Liu, Ziming and Lin, Dahua and Liu, Ziwei and Kumar, Vikash and Li, Yijiang and Yang, Lei and Cai, Zhongang and Deng, Hokin}, journal = {arXiv preprint arXiv:2602.20159}, year = {2026}, url = {https://arxiv.org/abs/2602.20159} } ``` ## License This dataset is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

提供机构：

Video-Reason

搜集汇总

数据集介绍

构建方式

在视频推理领域，大规模高质量训练数据的构建是推动模型能力边界的关键。VBVR-Dataset的构建依托一套系统化的数据工厂流程，通过精心设计的100个任务生成器，每个生成器专注于特定的几何、图论或物体物理推理场景。这些生成器以程序化方式合成视频序列，确保每个样本包含完整的时空演变过程，并附带结构化的元数据描述。整个数据集包含一百万个样本，以分块压缩的tar文件形式组织，每个样本均提供初始帧、结束帧、真实视频及文本提示，形成了层次清晰、便于扩展的数据架构。

特点

该数据集最显著的特征在于其前所未有的规模与多样性，涵盖几何变换、物体交互、物理运动等复杂推理任务。每个样本不仅提供视频序列，还包含精准对齐的文本提示和详尽的生成参数元数据，支持多模态学习。数据以生成器为单位进行组织，使得任务类型分布明确，便于进行可控的缩放研究和能力评估。数据集采用开放许可，并配套提供完整的评估工具链与基准排行榜，为视频生成模型的推理能力训练与测评建立了坚实的基础设施。

使用方法

研究者可通过Hugging Face数据集库直接加载元数据进行探索，无需下载庞大的媒体文件。对于具体任务，可以按需下载特定生成器对应的压缩包，或使用命令行工具批量获取。数据加载后，可结合提供的训练代码库VBVR-Wan2.2进行模型训练，并利用VBVR-Bench评估套件对模型性能进行量化测评。数据集的结构化设计使得其能够无缝集成到现有的视频生成与推理研究流程中，支持从元数据分析到端到端模型训练的全链条应用。

背景与挑战

背景概述

随着视频生成模型向复杂推理能力演进，传统数据集在规模与任务多样性上存在局限。VBVR-Dataset作为“超大视频推理套件”的核心训练资源，由Video-Reason团队于2026年发布，旨在通过百万量级的视频片段与文本提示对，系统化地支撑模型在时空维度上的推理能力训练。该数据集涵盖几何图形、物体物理等百类生成器，其构建基于严格的程序化生成流程，为视频推理领域提供了首个大规模、结构化的基准，推动了生成式人工智能在动态场景理解方面的研究边界。

当前挑战

视频推理领域长期面临动态场景中时空关系建模的复杂性挑战，要求模型不仅能识别物体，还需理解其交互、运动与因果逻辑。VBVR-Dataset针对此问题，通过多样化生成任务系统评估模型在长程依赖、物理规律遵从等方面的推理能力。在构建过程中，团队需克服程序化生成的质量控制难题，确保百万视频在视觉真实性、逻辑一致性上的高标准，同时处理海量多媒体数据的存储、索引与分布式访问，以维持数据集的结构化与可用性。

常用场景

经典使用场景

在视频生成与推理领域，VBVR-Dataset作为大规模训练资源，其经典应用场景在于支撑视频生成模型的端到端训练与评估。该数据集通过百万级视频样本与对应的文本推理提示，为模型提供了丰富的时空推理任务，如物体运动轨迹预测、场景几何变换理解等。研究者可利用其结构化元数据与多模态对齐特性，系统性地探索模型在复杂视频序列生成中的泛化能力与推理精度，从而推动视频生成技术向更高层次的认知智能迈进。

实际应用

在实际应用层面，VBVR-Dataset为自动驾驶仿真、机器人动作规划、智能视频编辑等场景提供了关键的训练与测试数据。例如，在自动驾驶领域，模型可借助数据集中物体重现、轨迹预测等任务，学习对复杂交通场景的时序演化进行准确推理；在影视制作中，该数据集支持生成符合物理规律的动态特效，提升视频内容的自动化生成质量。其高保真的视频样本与丰富的元数据标注，使得工业界能够构建更可靠、可解释的视频理解与生成系统。

衍生相关工作

围绕VBVR-Dataset已衍生出一系列经典研究工作，包括VBVR-Wan2.2视频生成模型、VBVR-Bench评估基准及配套的DataFactory数据工厂工具链。这些工作构建了从数据合成、模型训练到性能评估的完整技术生态，其中VBVR-Wan2.2作为基于该数据集训练的大规模生成模型，在多项视频推理任务中刷新了性能记录；而VBVR-Bench则提供了标准化的评测协议与排行榜，持续推动社区在视频推理领域的算法迭代与理论突破。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集