quantiphi-routing/perceive-benchmark

Name: quantiphi-routing/perceive-benchmark
Creator: quantiphi-routing
Published: 2026-04-29 16:51:05
License: 暂无描述

Hugging Face2026-04-29 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/quantiphi-routing/perceive-benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

PERCEIVE（基于心理物理学的视觉语言评估中路由成本效率的激发）是一个包含4,801个样本的文档图像问答基准，用于成本感知的视觉语言模型路由。每个样本都带有心理物理复杂性注释（视觉依赖分数、推理深度分数、空间范围分数）和一个路由标签，标识了能够正确回答该样本的最便宜的模型-预算配置。路由标签是通过QUEST风格的自适应级联方法得出的，该方法在7个商业视觉语言模型的4个推理预算级别（28种配置）上实现了60.7%的成本降低，并保持了100%的地面真实标签一致性。数据集结构包括样本数据、路由标签、模型评估结果、图像嵌入等多个文件。样本来自16个公共文档图像数据集，如DocVQA、SlideVQA等。

PERCEIVE (Psychophysics-grounded Elicitation for Routing Cost-Efficiency In Vision-Language Evaluation) is a 4,801-sample document-image QA benchmark for cost-aware VLM routing. Each sample carries psychophysical complexity annotations (Visual Dependency Score, Reasoning Depth Score, Spatial Extent Score) and a routing label identifying the cheapest model-budget configuration that answers it correctly. Routing labels are derived via a QUEST-style adaptive cascade achieving 60.7% cost reduction with 100% ground-truth label agreement across 7 commercial VLMs at 4 reasoning-budget levels (28 configurations). The dataset structure includes sample data, routing labels, model evaluation results, image embeddings, and more. Samples are drawn from 16 public document-image datasets such as DocVQA, SlideVQA, etc.

提供机构：

quantiphi-routing

5,000+

优质数据集

54 个

任务类型

进入经典数据集