VisIT-Bench
收藏arXiv2023-12-26 更新2024-06-21 收录
下载链接:
https://visit-bench.github.io/
下载链接
链接失效反馈官方服务:
资源简介:
VisIT-Bench是一个为评估视觉语言指令遵循模型而设计的基准数据集,由希伯来大学等机构创建。该数据集包含592个测试查询,每个查询都配有人工编写的指令条件标题,旨在评估模型在真实世界应用中的表现。数据集涵盖从基础识别到游戏玩耍和创意生成的广泛任务,通过详细的指令条件标题,可以收集人类验证的参考输出,并使用仅文本的大型语言模型自动评估候选多模态生成。VisIT-Bench是一个动态的基准,允许实践者提交其模型的响应,并通过项目网站上的数据、代码和排行榜进行评估。
VisIT-Bench is a benchmark dataset developed for evaluating vision-language instruction-following models, constructed by institutions including the Hebrew University of Jerusalem. This dataset comprises 592 test queries, each paired with manually authored instruction-conditioned prompts, with the goal of assessing models' real-world application performance. The dataset spans a broad spectrum of tasks, ranging from basic recognition to game playing and creative generation. Leveraging detailed instruction-conditioned prompts, it enables the collection of human-validated reference outputs, and supports automatic evaluation of candidate multimodal generations using text-only large language models. As a dynamic benchmark, VisIT-Bench allows practitioners to submit their model responses for assessment via the data, code, and leaderboard hosted on its official project website.
提供机构:
希伯来大学
创建时间:
2023-08-12



