VRC-Bench 视觉推理基准测试数据集

超神经2025-02-13 更新2025-01-25 收录

下载链接：

https://hyper.ai/cn/datasets/37360

下载链接

链接失效反馈

官方服务：

资源简介：

VRC-Bench 是首个专为多模态逐步推理任务设计的基准测试，旨在全面评估模型在复杂推理场景中的表现，由穆罕默德·本·扎耶德人工智能大学、中佛罗里达大学、林雪平大学和澳大利亚国立大学于 2025 年发布，相关论文成果为「LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs」。与传统基准测试仅关注最终结果的准确性不同，VRC-Bench 专注于评估每个推理步骤的质量，从而提供更细致的模型能力评估。

VRC-Bench is the first benchmark specifically designed for multimodal step-by-step reasoning tasks, aiming to comprehensively evaluate model performance in complex reasoning scenarios. It was released in 2025 by Mohamed bin Zayed University of Artificial Intelligence, University of Central Florida, Linköping University, and Australian National University, with its associated paper titled "LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs". Unlike traditional benchmarks that only focus on the accuracy of final results, VRC-Bench prioritizes evaluating the quality of each individual reasoning step, thereby enabling a more granular assessment of model capabilities.

创建时间：

2025-01-21

搜集汇总

数据集介绍

背景与挑战

背景概述

VRC-Bench是首个专为多模态逐步推理设计的基准测试数据集，于2025年由多个学术机构联合发布。它通过评估每个推理步骤的质量，而非仅关注最终结果，来细致衡量模型在复杂场景中的表现。该数据集涵盖8个不同领域，包含超过4000个手动验证的推理步骤，用于全面测试模型推理的准确性与逻辑连贯性。

以上内容由遇见数据集搜集并总结生成