VincentHancoder/ViGoR-Bench
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/VincentHancoder/ViGoR-Bench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- image-to-image
- visual-question-answering
tags:
- benchmark
- reasoning
- vision
- generative-model
- evaluation
pretty_name: ViGoR-Bench
size_categories:
- 1K<n<10K
---
<div align="center">
<h1>ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?</h1>
[](https://huggingface.co/datasets/VincentHancoder/ViGoR-Bench)
[](https://creativecommons.org/licenses/by-nc/4.0/)
<img src="ViGoR_overview.png" alt="ViGoR-Bench Overview" width="360"/>
</div>
---
## 🔍 Overview
**ViGoR-Bench** (**Vi**sion-**G**enerative **R**easoning-centric Benchmark) is a unified evaluation framework designed to stress-test the reasoning capabilities of visual generative models. Beneath the stunning visual fidelity of modern AIGC models lies a *logical desert* — systems frequently fail tasks requiring physical, causal, or complex spatial reasoning. Existing evaluations, relying on superficial metrics or fragmented benchmarks, create a *performance mirage* that overlooks the generative process.
ViGoR-Bench dismantles this mirage through:
- **Holistic Cross-Modal Coverage** — bridging Image-to-Image and Video generation tasks.
- **Dual-Track Evaluation** — assessing both intermediate reasoning processes and final outputs.
- **Evidence-Grounded Automated Judge** — ensuring high alignment with human judgment.
- **Granular Diagnostic Analysis** — decomposing performance into fine-grained cognitive dimensions.
Experiments on **20+ leading models** reveal that even state-of-the-art systems harbor significant reasoning deficits, establishing ViGoR-Bench as a critical stress test for the next generation of intelligent vision models.
---
## 📂 Dataset Structure
```
ViGoR-Bench/
├── README.md
├── statistics.json
│
├── Physical_Reasoning/
│ ├── Sorting_and_Categorization/
│ │ ├── records.json
│ │ ├── input_XXXX.png
│ │ └── ...
│ ├── Situational_Decision_Making/
│ ├── Attribute_Recognition/
│ ├── Object_Assembly/
│ ├── Spatial_Reasoning/
│ └── Measurement_and_Verification/
│
├── Knowledge_Reasoning/
│ ├── Common_Sense/
│ ├── Geography/
│ ├── Biology/
│ ├── Physics/
│ ├── Sports/
│ ├── Chemistry/
│ └── History/
│
└── Symbolic_Reasoning/
├── Block_Building/
├── Algebraic_Calculation/
├── Function_Plotting/
├── Jigsaw_Puzzle/
├── Klotski_Puzzle/
├── Maze_Navigation/
└── Sudoku/
```
Each subcategory folder contains:
- **`records.json`** — Ground-truth annotations for all cases in that category.
- **`input_XXXX.png`** — Input images provided to the model.
- **`output_XXXX.png`** — Reference ground-truth images (where applicable).
---
## 📝 Annotation Format
Each `records.json` is a JSON array. The annotation fields are described below:
| Field | Description |
|---|---|
| `id` | Unique case identifier |
| `input_image` | Filename of the input image |
| `edit_instruction` | Task instruction given to the generative model |
| `ref_text` | Textual description of the expected output (ground truth) |
| `output_image` | Filename of the reference GT image (if available) |
---
## Citation
If you find ViGoR-Bench useful, please cite our paper:
```bibtex
@article{vigor2025,
title={ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?},
author={},
year={2025}
}
```
---
## License
This dataset is released under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.
许可证:CC BY-NC 4.0
任务类别:
- 图像到图像
- 视觉问答
标签:
- 基准测试(benchmark)
- 推理
- 视觉
- 生成模型
- 评估
展示名称:ViGoR-Bench
规模类别:
- 1K<n<10K
---
<div align="center">
<h1>ViGoR-Bench:视觉生成模型距离零样本视觉推理器还有多远?</h1>
[](https://huggingface.co/datasets/VincentHancoder/ViGoR-Bench)
[](https://creativecommons.org/licenses/by-nc/4.0/)
<img src="ViGoR_overview.png" alt="ViGoR-Bench 概览" width="360"/>
</div>
---
## 🔍 概览
**ViGoR-Bench**(**Vi**sion-**G**enerative **R**easoning-centric Benchmark,即聚焦视觉生成推理的基准测试集)是一款统一的评估框架,旨在对视觉生成模型的推理能力开展压力测试。当前生成式AI(AIGC)模型虽具备卓越的视觉保真度,但其底层却存在“逻辑荒漠”——系统时常在需要物理、因果或复杂空间推理的任务中出现失误。现有评估手段多依赖表层指标或碎片化的基准测试,制造出一种“性能幻象”,忽视了生成过程本身的逻辑合理性。ViGoR-Bench通过以下四大维度打破这一幻象:
- **全模态覆盖**:打通图像到图像与视频生成任务的评估边界
- **双轨评估机制**:同时考核模型的中间推理过程与最终输出结果
- **基于证据的自动化评判**:确保评判结果与人类判断高度对齐
- **精细化诊断分析**:将模型性能拆解至细粒度的认知维度
针对20余家主流模型的实验结果表明,即便最先进的视觉生成系统仍存在显著的推理缺陷,这也确立了ViGoR-Bench作为下一代智能视觉模型关键压力测试基准的地位。
---
## 📂 数据集结构
ViGoR-Bench/
├── README.md
├── statistics.json
│
├── Physical_Reasoning/
│ ├── Sorting_and_Categorization/
│ │ ├── records.json
│ │ ├── input_XXXX.png
│ │ └── ...
│ ├── Situational_Decision_Making/
│ ├── Attribute_Recognition/
│ ├── Object_Assembly/
│ ├── Spatial_Reasoning/
│ └── Measurement_and_Verification/
│
├── Knowledge_Reasoning/
│ ├── Common_Sense/
│ ├── Geography/
│ ├── Biology/
│ ├── Physics/
│ ├── Sports/
│ ├── Chemistry/
│ └── History/
│
└── Symbolic_Reasoning/
├── Block_Building/
├── Algebraic_Calculation/
├── Function_Plotting/
├── Jigsaw_Puzzle/
├── Klotski_Puzzle/
├── Maze_Navigation/
└── Sudoku/
每个子类别文件夹包含以下文件:
- **`records.json`**:对应类别下所有测试用例的真实标注
- **`input_XXXX.png`**:提供给模型的输入图像文件
- **`output_XXXX.png`**:参考的真实输出图像(如适用)
---
## 📝 标注格式
每个`records.json`为JSON数组,其标注字段说明如下:
| 字段 | 说明 |
|---|---|
| `id` | 唯一测试用例标识符 |
| `input_image` | 输入图像的文件名 |
| `edit_instruction` | 向生成模型下达的任务指令 |
| `ref_text` | 预期输出的文本描述(即真实标注) |
| `output_image` | 参考真实图像的文件名(若有) |
---
## 引用
若您认为ViGoR-Bench对您的研究有所帮助,请引用我们的论文:
bibtex
@article{vigor2025,
title={ViGoR-Bench: 视觉生成模型距离零样本视觉推理器还有多远?},
author={},
year={2025}
}
---
## 许可证
本数据集采用[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)许可证发布。
提供机构:
VincentHancoder



