quantiphi-routing/perceive-benchmark
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/quantiphi-routing/perceive-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
PERCEIVE(基于心理物理学的视觉语言评估中路由成本效率的激发)是一个包含4,801个样本的文档图像问答基准,用于成本感知的视觉语言模型路由。每个样本都带有心理物理复杂性注释(视觉依赖分数、推理深度分数、空间范围分数)和一个路由标签,标识了能够正确回答该样本的最便宜的模型-预算配置。路由标签是通过QUEST风格的自适应级联方法得出的,该方法在7个商业视觉语言模型的4个推理预算级别(28种配置)上实现了60.7%的成本降低,并保持了100%的地面真实标签一致性。数据集结构包括样本数据、路由标签、模型评估结果、图像嵌入等多个文件。样本来自16个公共文档图像数据集,如DocVQA、SlideVQA等。
PERCEIVE (Psychophysics-grounded Elicitation for Routing Cost-Efficiency In Vision-Language Evaluation) is a 4,801-sample document-image QA benchmark for cost-aware VLM routing. Each sample carries psychophysical complexity annotations (Visual Dependency Score, Reasoning Depth Score, Spatial Extent Score) and a routing label identifying the cheapest model-budget configuration that answers it correctly. Routing labels are derived via a QUEST-style adaptive cascade achieving 60.7% cost reduction with 100% ground-truth label agreement across 7 commercial VLMs at 4 reasoning-budget levels (28 configurations). The dataset structure includes sample data, routing labels, model evaluation results, image embeddings, and more. Samples are drawn from 16 public document-image datasets such as DocVQA, SlideVQA, etc.
提供机构:
quantiphi-routing



