VisIT-Bench

Name: VisIT-Bench
Creator: 希伯来大学
Published: 2023-12-26 23:57:47
License: 暂无描述

arXiv2023-12-26 更新2024-06-21 收录

下载链接：

https://visit-bench.github.io/

下载链接

链接失效反馈

官方服务：

资源简介：

VisIT-Bench是一个为评估视觉语言指令遵循模型而设计的基准数据集，由希伯来大学等机构创建。该数据集包含592个测试查询，每个查询都配有人工编写的指令条件标题，旨在评估模型在真实世界应用中的表现。数据集涵盖从基础识别到游戏玩耍和创意生成的广泛任务，通过详细的指令条件标题，可以收集人类验证的参考输出，并使用仅文本的大型语言模型自动评估候选多模态生成。VisIT-Bench是一个动态的基准，允许实践者提交其模型的响应，并通过项目网站上的数据、代码和排行榜进行评估。

VisIT-Bench is a benchmark dataset developed for evaluating vision-language instruction-following models, constructed by institutions including the Hebrew University of Jerusalem. This dataset comprises 592 test queries, each paired with manually authored instruction-conditioned prompts, with the goal of assessing models' real-world application performance. The dataset spans a broad spectrum of tasks, ranging from basic recognition to game playing and creative generation. Leveraging detailed instruction-conditioned prompts, it enables the collection of human-validated reference outputs, and supports automatic evaluation of candidate multimodal generations using text-only large language models. As a dynamic benchmark, VisIT-Bench allows practitioners to submit their model responses for assessment via the data, code, and leaderboard hosted on its official project website.

提供机构：

希伯来大学

创建时间：

2023-08-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集