MM-Hallu/MERLIM

Name: MM-Hallu/MERLIM
Creator: MM-Hallu
Published: 2026-04-24 16:22:40
License: 暂无描述

Hugging Face2026-04-24 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/MM-Hallu/MERLIM

下载链接

链接失效反馈

官方服务：

资源简介：

MERLIM（大型图像-语言模型多模态评估基准）是一个可扩展的基准，用于评估IT-LVLMs在基本计算机视觉任务上的表现，重点是检测跨模态幻觉事件。它包含超过42K个条目，分为三个评估分割：classification_counting（31,373个条目）：使用经过编辑（修复）的COCO图像进行对象计数和识别任务，其中对象已被移除；reasoning_curated（5,630个条目）：使用精选的关系集和是/否问题进行对象间关系理解；reasoning_random（5,630个条目）：使用随机选择的关系进行对象间关系理解。每个条目包括原始COCO图像、对象移除元数据（类别、边界框），以及关系任务的谓词/主语/宾语注释和正负是/否问题-答案对。

MERLIM (Multi-modal Evaluation Benchmark for Large Image-Language Models) is a scalable benchmark to assess IT-LVLMs on fundamental computer vision tasks with a focus on detecting cross-modal hallucination events. It contains over 42K entries across three evaluation splits: classification_counting (31,373 entries): Object counting and recognition tasks using edited (in-painted) COCO images where objects have been removed; reasoning_curated (5,630 entries): Inter-object relationship understanding with curated relationship sets and yes/no questions; reasoning_random (5,630 entries): Inter-object relationship understanding with randomly selected relationships. Each entry includes the original COCO image, object removal metadata (category, bounding box), and for relationship tasks, predicate/subject/object annotations with positive and negative yes/no question-answer pairs.

提供机构：

MM-Hallu

5,000+

优质数据集

54 个

任务类型

进入经典数据集