prometheus-eval/Perception-Bench

Name: prometheus-eval/Perception-Bench
Creator: prometheus-eval
Published: 2024-01-15 14:25:01
License: 暂无描述

Hugging Face2024-01-15 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/prometheus-eval/Perception-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

Perception-Bench是一个用于评估视觉语言模型（VLM）在不同图像领域长篇回答能力的基准测试集。它是Perception-Collection的保留测试集。数据集包含图像路径、指令、原始指令、参考答案、评分标准等字段。测试集包含500个样本。

Perception-Bench is a benchmark for evaluating the long-form response of a Vision Language Model (VLM) across various domains of images. It is a held-out test set of the Perception-Collection. The dataset includes image paths, instructions, original instructions, reference answers, scoring criteria, and descriptions for each scoring level. The dataset is primarily in English and contains 500 test samples.

提供机构：

prometheus-eval

原始信息汇总

数据集卡片

数据集概述

Perception-Bench 是一个用于评估视觉语言模型（VLM）在各种图像领域中长篇响应的基准。它是 Perception-Collection 的保留测试集。

语言

英语

数据集结构

image: 用于训练的图像路径，包含来自 MMMU 数据集和 COCO 2017 训练数据集的图像。
instruction: 提供给评估 VLM 的输入，包括评估指令和响应、参考答案、评分标准。
orig_instruction: 待评估的指令，与包含所有组件的指令不同。
orig_reference_answer: 对 orig_instruction 的参考答案。
orig_criteria: 用于评估 orig_response 的评分标准。
orig_score1_description: 给出 orig_response 得分1的描述。
orig_score2_description: 给出 orig_response 得分2的描述。
orig_score3_description: 给出 orig_response 得分3的描述。
orig_score4_description: 给出 orig_response 得分4的描述。
orig_score5_description: 给出 orig_response 得分5的描述。

数据分割

名称	测试集数量
Perception-Bench	500

引用信息

如果您发现该基准有用，请考虑引用我们的论文：

bibtex @misc{lee2024prometheusvision, title={Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation}, author={Seongyun Lee and Seungone Kim and Sue Hyun Park and Geewook Kim and Minjoon Seo}, year={2024}, eprint={2401.06591}, archivePrefix={arXiv}, primaryClass={cs.CL} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集