VincentHancoder/ViGoR-Bench

Name: VincentHancoder/ViGoR-Bench
Creator: VincentHancoder
Published: 2026-03-24 11:26:22
License: 暂无描述

Hugging Face2026-03-24 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/VincentHancoder/ViGoR-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 task_categories: - image-to-image - visual-question-answering tags: - benchmark - reasoning - vision - generative-model - evaluation pretty_name: ViGoR-Bench size_categories: - 1K<n<10K --- <div align="center"> <h1>ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?</h1> [![Dataset](https://img.shields.io/badge/🤗%20Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/VincentHancoder/ViGoR-Bench) [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/) <img src="ViGoR_overview.png" alt="ViGoR-Bench Overview" width="360"/> </div> --- ## 🔍 Overview **ViGoR-Bench** (**Vi**sion-**G**enerative **R**easoning-centric Benchmark) is a unified evaluation framework designed to stress-test the reasoning capabilities of visual generative models. Beneath the stunning visual fidelity of modern AIGC models lies a *logical desert* — systems frequently fail tasks requiring physical, causal, or complex spatial reasoning. Existing evaluations, relying on superficial metrics or fragmented benchmarks, create a *performance mirage* that overlooks the generative process. ViGoR-Bench dismantles this mirage through: - **Holistic Cross-Modal Coverage** — bridging Image-to-Image and Video generation tasks. - **Dual-Track Evaluation** — assessing both intermediate reasoning processes and final outputs. - **Evidence-Grounded Automated Judge** — ensuring high alignment with human judgment. - **Granular Diagnostic Analysis** — decomposing performance into fine-grained cognitive dimensions. Experiments on **20+ leading models** reveal that even state-of-the-art systems harbor significant reasoning deficits, establishing ViGoR-Bench as a critical stress test for the next generation of intelligent vision models. --- ## 📂 Dataset Structure ``` ViGoR-Bench/ ├── README.md ├── statistics.json │ ├── Physical_Reasoning/ │ ├── Sorting_and_Categorization/ │ │ ├── records.json │ │ ├── input_XXXX.png │ │ └── ... │ ├── Situational_Decision_Making/ │ ├── Attribute_Recognition/ │ ├── Object_Assembly/ │ ├── Spatial_Reasoning/ │ └── Measurement_and_Verification/ │ ├── Knowledge_Reasoning/ │ ├── Common_Sense/ │ ├── Geography/ │ ├── Biology/ │ ├── Physics/ │ ├── Sports/ │ ├── Chemistry/ │ └── History/ │ └── Symbolic_Reasoning/ ├── Block_Building/ ├── Algebraic_Calculation/ ├── Function_Plotting/ ├── Jigsaw_Puzzle/ ├── Klotski_Puzzle/ ├── Maze_Navigation/ └── Sudoku/ ``` Each subcategory folder contains: - **`records.json`** — Ground-truth annotations for all cases in that category. - **`input_XXXX.png`** — Input images provided to the model. - **`output_XXXX.png`** — Reference ground-truth images (where applicable). --- ## 📝 Annotation Format Each `records.json` is a JSON array. The annotation fields are described below: | Field | Description | |---|---| | `id` | Unique case identifier | | `input_image` | Filename of the input image | | `edit_instruction` | Task instruction given to the generative model | | `ref_text` | Textual description of the expected output (ground truth) | | `output_image` | Filename of the reference GT image (if available) | --- ## Citation If you find ViGoR-Bench useful, please cite our paper: ```bibtex @article{vigor2025, title={ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?}, author={}, year={2025} } ``` --- ## License This dataset is released under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.

许可证：CC BY-NC 4.0 任务类别： - 图像到图像 - 视觉问答标签： - 基准测试（benchmark） - 推理 - 视觉 - 生成模型 - 评估展示名称：ViGoR-Bench 规模类别： - 1K<n<10K --- <div align="center"> <h1>ViGoR-Bench：视觉生成模型距离零样本视觉推理器还有多远？</h1> [![数据集](https://img.shields.io/badge/🤗%20Hugging%20Face-数据集-blue)](https://huggingface.co/datasets/VincentHancoder/ViGoR-Bench) [![许可证：CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/) <img src="ViGoR_overview.png" alt="ViGoR-Bench 概览" width="360"/> </div> --- ## 🔍 概览 **ViGoR-Bench**（**Vi**sion-**G**enerative **R**easoning-centric Benchmark，即聚焦视觉生成推理的基准测试集）是一款统一的评估框架，旨在对视觉生成模型的推理能力开展压力测试。当前生成式AI（AIGC）模型虽具备卓越的视觉保真度，但其底层却存在“逻辑荒漠”——系统时常在需要物理、因果或复杂空间推理的任务中出现失误。现有评估手段多依赖表层指标或碎片化的基准测试，制造出一种“性能幻象”，忽视了生成过程本身的逻辑合理性。ViGoR-Bench通过以下四大维度打破这一幻象： - **全模态覆盖**：打通图像到图像与视频生成任务的评估边界 - **双轨评估机制**：同时考核模型的中间推理过程与最终输出结果 - **基于证据的自动化评判**：确保评判结果与人类判断高度对齐 - **精细化诊断分析**：将模型性能拆解至细粒度的认知维度针对20余家主流模型的实验结果表明，即便最先进的视觉生成系统仍存在显著的推理缺陷，这也确立了ViGoR-Bench作为下一代智能视觉模型关键压力测试基准的地位。 --- ## 📂 数据集结构 ViGoR-Bench/ ├── README.md ├── statistics.json │ ├── Physical_Reasoning/ │ ├── Sorting_and_Categorization/ │ │ ├── records.json │ │ ├── input_XXXX.png │ │ └── ... │ ├── Situational_Decision_Making/ │ ├── Attribute_Recognition/ │ ├── Object_Assembly/ │ ├── Spatial_Reasoning/ │ └── Measurement_and_Verification/ │ ├── Knowledge_Reasoning/ │ ├── Common_Sense/ │ ├── Geography/ │ ├── Biology/ │ ├── Physics/ │ ├── Sports/ │ ├── Chemistry/ │ └── History/ │ └── Symbolic_Reasoning/ ├── Block_Building/ ├── Algebraic_Calculation/ ├── Function_Plotting/ ├── Jigsaw_Puzzle/ ├── Klotski_Puzzle/ ├── Maze_Navigation/ └── Sudoku/ 每个子类别文件夹包含以下文件： - **`records.json`**：对应类别下所有测试用例的真实标注 - **`input_XXXX.png`**：提供给模型的输入图像文件 - **`output_XXXX.png`**：参考的真实输出图像（如适用） --- ## 📝 标注格式每个`records.json`为JSON数组，其标注字段说明如下： | 字段 | 说明 | |---|---| | `id` | 唯一测试用例标识符 | | `input_image` | 输入图像的文件名 | | `edit_instruction` | 向生成模型下达的任务指令 | | `ref_text` | 预期输出的文本描述（即真实标注） | | `output_image` | 参考真实图像的文件名（若有） | --- ## 引用若您认为ViGoR-Bench对您的研究有所帮助，请引用我们的论文： bibtex @article{vigor2025, title={ViGoR-Bench: 视觉生成模型距离零样本视觉推理器还有多远？}, author={}, year={2025} } --- ## 许可证本数据集采用[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)许可证发布。

提供机构：

VincentHancoder

5,000+

优质数据集

54 个

任务类型

进入经典数据集