MM-Hallu/MIHBench

Name: MM-Hallu/MIHBench
Creator: MM-Hallu
Published: 2026-04-30 04:15:49
License: 暂无描述

Hugging Face2026-04-30 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/MM-Hallu/MIHBench

下载链接

链接失效反馈

官方服务：

资源简介：

MIHBench是一个用于评估多模态大语言模型（MLLMs）中多图像理解能力的多图像幻觉基准数据集。该数据集包含3,200个样本，分布在4个任务中（每个任务800个样本），每个样本包含2-4张来自COCO数据集的图像。数据集的特征包括图像列表、自然语言问题、标签（“是”或“否”）、任务标识符、图像数量、图像源文件名等。对于计数任务，还包含额外的字段如injected（布尔值）和object_counts（JSON字符串）。任务类型包括计数（2张图像，判断目标对象数量是否相同）、存在对抗（3张图像，判断目标对象是否存在于所有图像中，使用罕见/混淆对象）、存在流行（3张图像，使用常见对象）和存在随机（3张图像，使用随机对象）。评估指标包括准确率、精确率、召回率和F1分数。数据集来源为MIHBench，发表于ACM Multimedia 2025。

MIHBench is a Multi-Image Hallucination Benchmark for evaluating multi-image understanding in Multimodal Large Language Models (MLLMs). The dataset contains 3,200 samples across 4 tasks (800 each), with each sample containing 2-4 images from the COCO dataset. Features include a list of images, a natural language question about the images, a ground truth label ("yes" or "no"), a task identifier, the number of images, and source image filenames. Additional fields for the count task include `injected` (boolean) and `object_counts` (JSON string). Task types include count (2 images, same number of target object in both images), existence_adversarial (3 images, target object exists in all images with rare/confusing objects), existence_popular (3 images, common objects), and existence_random (3 images, random objects). Evaluation metrics include Accuracy, Precision, Recall, and F1. The dataset is sourced from MIHBench, published at ACM Multimedia 2025.

提供机构：

MM-Hallu

5,000+

优质数据集

54 个

任务类型

进入经典数据集