five

VincentHancoder/ViGoR-Bench

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/VincentHancoder/ViGoR-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - image-to-image - visual-question-answering tags: - benchmark - reasoning - vision - generative-model - evaluation pretty_name: ViGoR-Bench size_categories: - 1K<n<10K --- <div align="center"> <h1>ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?</h1> [![Dataset](https://img.shields.io/badge/🤗%20Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/VincentHancoder/ViGoR-Bench) [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/) <img src="ViGoR_overview.png" alt="ViGoR-Bench Overview" width="360"/> </div> --- ## 🔍 Overview **ViGoR-Bench** (**Vi**sion-**G**enerative **R**easoning-centric Benchmark) is a unified evaluation framework designed to stress-test the reasoning capabilities of visual generative models. Beneath the stunning visual fidelity of modern AIGC models lies a *logical desert* — systems frequently fail tasks requiring physical, causal, or complex spatial reasoning. Existing evaluations, relying on superficial metrics or fragmented benchmarks, create a *performance mirage* that overlooks the generative process. ViGoR-Bench dismantles this mirage through: - **Holistic Cross-Modal Coverage** — bridging Image-to-Image and Video generation tasks. - **Dual-Track Evaluation** — assessing both intermediate reasoning processes and final outputs. - **Evidence-Grounded Automated Judge** — ensuring high alignment with human judgment. - **Granular Diagnostic Analysis** — decomposing performance into fine-grained cognitive dimensions. Experiments on **20+ leading models** reveal that even state-of-the-art systems harbor significant reasoning deficits, establishing ViGoR-Bench as a critical stress test for the next generation of intelligent vision models. --- ## 📂 Dataset Structure ``` ViGoR-Bench/ ├── README.md ├── statistics.json │ ├── Physical_Reasoning/ │ ├── Sorting_and_Categorization/ │ │ ├── records.json │ │ ├── input_XXXX.png │ │ └── ... │ ├── Situational_Decision_Making/ │ ├── Attribute_Recognition/ │ ├── Object_Assembly/ │ ├── Spatial_Reasoning/ │ └── Measurement_and_Verification/ │ ├── Knowledge_Reasoning/ │ ├── Common_Sense/ │ ├── Geography/ │ ├── Biology/ │ ├── Physics/ │ ├── Sports/ │ ├── Chemistry/ │ └── History/ │ └── Symbolic_Reasoning/ ├── Block_Building/ ├── Algebraic_Calculation/ ├── Function_Plotting/ ├── Jigsaw_Puzzle/ ├── Klotski_Puzzle/ ├── Maze_Navigation/ └── Sudoku/ ``` Each subcategory folder contains: - **`records.json`** — Ground-truth annotations for all cases in that category. - **`input_XXXX.png`** — Input images provided to the model. - **`output_XXXX.png`** — Reference ground-truth images (where applicable). --- ## 📝 Annotation Format Each `records.json` is a JSON array. The annotation fields are described below: | Field | Description | |---|---| | `id` | Unique case identifier | | `input_image` | Filename of the input image | | `edit_instruction` | Task instruction given to the generative model | | `ref_text` | Textual description of the expected output (ground truth) | | `output_image` | Filename of the reference GT image (if available) | --- ## Citation If you find ViGoR-Bench useful, please cite our paper: ```bibtex @article{vigor2025, title={ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?}, author={}, year={2025} } ``` --- ## License This dataset is released under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.

许可证:CC BY-NC 4.0 任务类别: - 图像到图像 - 视觉问答 标签: - 基准测试(benchmark) - 推理 - 视觉 - 生成模型 - 评估 展示名称:ViGoR-Bench 规模类别: - 1K<n<10K --- <div align="center"> <h1>ViGoR-Bench:视觉生成模型距离零样本视觉推理器还有多远?</h1> [![数据集](https://img.shields.io/badge/🤗%20Hugging%20Face-数据集-blue)](https://huggingface.co/datasets/VincentHancoder/ViGoR-Bench) [![许可证:CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/) <img src="ViGoR_overview.png" alt="ViGoR-Bench 概览" width="360"/> </div> --- ## 🔍 概览 **ViGoR-Bench**(**Vi**sion-**G**enerative **R**easoning-centric Benchmark,即聚焦视觉生成推理的基准测试集)是一款统一的评估框架,旨在对视觉生成模型的推理能力开展压力测试。当前生成式AI(AIGC)模型虽具备卓越的视觉保真度,但其底层却存在“逻辑荒漠”——系统时常在需要物理、因果或复杂空间推理的任务中出现失误。现有评估手段多依赖表层指标或碎片化的基准测试,制造出一种“性能幻象”,忽视了生成过程本身的逻辑合理性。ViGoR-Bench通过以下四大维度打破这一幻象: - **全模态覆盖**:打通图像到图像与视频生成任务的评估边界 - **双轨评估机制**:同时考核模型的中间推理过程与最终输出结果 - **基于证据的自动化评判**:确保评判结果与人类判断高度对齐 - **精细化诊断分析**:将模型性能拆解至细粒度的认知维度 针对20余家主流模型的实验结果表明,即便最先进的视觉生成系统仍存在显著的推理缺陷,这也确立了ViGoR-Bench作为下一代智能视觉模型关键压力测试基准的地位。 --- ## 📂 数据集结构 ViGoR-Bench/ ├── README.md ├── statistics.json │ ├── Physical_Reasoning/ │ ├── Sorting_and_Categorization/ │ │ ├── records.json │ │ ├── input_XXXX.png │ │ └── ... │ ├── Situational_Decision_Making/ │ ├── Attribute_Recognition/ │ ├── Object_Assembly/ │ ├── Spatial_Reasoning/ │ └── Measurement_and_Verification/ │ ├── Knowledge_Reasoning/ │ ├── Common_Sense/ │ ├── Geography/ │ ├── Biology/ │ ├── Physics/ │ ├── Sports/ │ ├── Chemistry/ │ └── History/ │ └── Symbolic_Reasoning/ ├── Block_Building/ ├── Algebraic_Calculation/ ├── Function_Plotting/ ├── Jigsaw_Puzzle/ ├── Klotski_Puzzle/ ├── Maze_Navigation/ └── Sudoku/ 每个子类别文件夹包含以下文件: - **`records.json`**:对应类别下所有测试用例的真实标注 - **`input_XXXX.png`**:提供给模型的输入图像文件 - **`output_XXXX.png`**:参考的真实输出图像(如适用) --- ## 📝 标注格式 每个`records.json`为JSON数组,其标注字段说明如下: | 字段 | 说明 | |---|---| | `id` | 唯一测试用例标识符 | | `input_image` | 输入图像的文件名 | | `edit_instruction` | 向生成模型下达的任务指令 | | `ref_text` | 预期输出的文本描述(即真实标注) | | `output_image` | 参考真实图像的文件名(若有) | --- ## 引用 若您认为ViGoR-Bench对您的研究有所帮助,请引用我们的论文: bibtex @article{vigor2025, title={ViGoR-Bench: 视觉生成模型距离零样本视觉推理器还有多远?}, author={}, year={2025} } --- ## 许可证 本数据集采用[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)许可证发布。
提供机构:
VincentHancoder
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作