MM-Hallu/MERLIM
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/MM-Hallu/MERLIM
下载链接
链接失效反馈官方服务:
资源简介:
MERLIM(大型图像-语言模型多模态评估基准)是一个可扩展的基准,用于评估IT-LVLMs在基本计算机视觉任务上的表现,重点是检测跨模态幻觉事件。它包含超过42K个条目,分为三个评估分割:classification_counting(31,373个条目):使用经过编辑(修复)的COCO图像进行对象计数和识别任务,其中对象已被移除;reasoning_curated(5,630个条目):使用精选的关系集和是/否问题进行对象间关系理解;reasoning_random(5,630个条目):使用随机选择的关系进行对象间关系理解。每个条目包括原始COCO图像、对象移除元数据(类别、边界框),以及关系任务的谓词/主语/宾语注释和正负是/否问题-答案对。
MERLIM (Multi-modal Evaluation Benchmark for Large Image-Language Models) is a scalable benchmark to assess IT-LVLMs on fundamental computer vision tasks with a focus on detecting cross-modal hallucination events. It contains over 42K entries across three evaluation splits: classification_counting (31,373 entries): Object counting and recognition tasks using edited (in-painted) COCO images where objects have been removed; reasoning_curated (5,630 entries): Inter-object relationship understanding with curated relationship sets and yes/no questions; reasoning_random (5,630 entries): Inter-object relationship understanding with randomly selected relationships. Each entry includes the original COCO image, object removal metadata (category, bounding box), and for relationship tasks, predicate/subject/object annotations with positive and negative yes/no question-answer pairs.
提供机构:
MM-Hallu



