MERLIM
收藏arXiv2023-12-04 更新2024-06-21 收录
下载链接:
https://github.com/ojedaf/MERLIM
下载链接
链接失效反馈官方服务:
资源简介:
MERLIM是由阿卜杜拉国王科技大学开发的用于评估大型图像-语言模型的多模态评估基准。该数据集包含超过279,000个图像-问题对,主要用于检测跨模态的'幻觉'事件,即语言输出指向图像中不存在或不相关的视觉概念。数据集通过编辑图像来验证模型的预测是否基于有效的视觉基础,从而评估模型在基本计算机视觉任务上的性能。MERLIM的应用领域包括对象识别、实例计数和对象间关系理解,旨在解决当前模型在零样本学习能力上的局限性。
MERLIM is a multimodal evaluation benchmark developed by King Abdullah University of Science and Technology (KAUST) for evaluating large image-language models. This dataset comprises over 279,000 image-question pairs, mainly used to detect cross-modal "hallucination" events—cases where the language output refers to visual concepts that do not exist or are irrelevant to the corresponding image content. By editing images to verify whether a model's predictions are grounded in valid visual information, the dataset assesses the model's performance on fundamental computer vision tasks. The application areas of MERLIM include object recognition, instance counting and inter-object relationship understanding, aiming to address the limitations of current models in zero-shot learning capabilities.
提供机构:
阿卜杜拉国王科技大学 (KAUST)
创建时间:
2023-12-04



