ViP-Bench
收藏arXiv2025-09-30 收录
下载链接:
https://vip-llava.github.io/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为ViP-Bench,包含303组精心设计的图像与问题配对,旨在全面评估视觉提示跟随能力。这些配对分为六个类别:物体识别、光学字符识别、知识、数学、物体关系推理和语言生成。在默认的划分中,视觉提示由紧致的边界框组成。该数据集的规模为303组配对,其任务是评估视觉提示跟随能力。
This dataset, named ViP-Bench, contains 303 meticulously designed image-question pairs, aiming to comprehensively evaluate visual prompt following capabilities. These pairs are categorized into six categories: object recognition, optical character recognition (OCR), knowledge, mathematics, object relational reasoning, and language generation. In the default setup, visual prompts are composed of compact bounding boxes. This dataset has a total of 303 image-question pairs, and its task is to evaluate visual prompt following abilities.



