Tong613/LIVE-multi-image-bench

Name: Tong613/LIVE-multi-image-bench
Creator: Tong613
Published: 2026-03-25 05:36:05
License: 暂无描述

Hugging Face2026-03-25 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Tong613/LIVE-multi-image-bench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - visual-question-answering - image-to-text language: - en tags: - multi-image - hallucination - evaluation - LVLM - multimodal size_categories: - 10K<n<100K --- # 📸 LIVE: An LLM-assisted Multi-Image Visual Hallucination Evaluation Benchmark [![GitHub](https://img.shields.io/badge/GitHub-Code_&_Tools-blue?logo=github)](https://github.com/Master-PLC/LIVE) Welcome to the **LIVE** dataset! This benchmark is designed to evaluate multi-image visual hallucinations in Large Vision-Language Models (LVLMs). ## 🌟 Key Features Unlike traditional single-image benchmarks, LIVE systematically addresses the complexities of multi-image understanding: - **Two Distinct Scenarios**: Evaluates context-dependent hallucination patterns under **Uniform Image Contexts (UIC)** (content confusion) and **Diverse Image Contexts (DIC)** (context interference). - **Multi-granularity Assessment Protocol (MAP)**: Measures hallucination rates across varying numbers of target images (1 to 4 images) rather than relying on a single overall query. - **Comprehensive Task Coverage**: Contains over 32K yes/no questions covering 6 visual recognition tasks: *Object, Material, Color, Sentiment, Action, and Position*. ## 📂 Dataset Structure The dataset contains 488 multi-image scenarios (242 UIC + 246 DIC) paired with daily-life images from MS-COCO. The data is stored in JSON format. ### Data Format Example Here is a sample from our JSON files (e.g., `main_k4_questions_attributes.json`): ```json { "task": "attributes", "type": "UIC", "qtype": "4", "image_id": [ "COCO_val2014_000000239985.jpg", "COCO_val2014_000000376628.jpg", "COCO_val2014_000000369763.jpg", "COCO_val2014_000000176793.jpg" ], "yes_question": "Is the lady smiling in image 4?", "no_question": "Is the lady frowning in image 4?", "ritem": "lady is smiling", "hitem": "lady is frowning", "yes_question_class": "Sentiment", "no_question_class": "Sentiment" } ``` ### 🔑 Key Fields Each JSON entry contains the following structured information: - **`task`**: The visual recognition task category (e.g., `attributes`, `actions`, `relations`). - **`type`**: The multi-image scenario type, either **`UIC`** (Uniform Image Context) or **`DIC`** (Diverse Image Context). - **`qtype`**: The granularity level, indicating the number of target images involved in the query (ranges from `1` to `4`). - **`image_id`**: A list of associated MS-COCO image filenames required for the scenario. - **`yes_question`** / **`no_question`**: The balanced binary visual questions. The `yes_question` targets factual content, while the `no_question` targets the hallucinated (counterfactual) probe. - **`ritem`** / **`hitem`**: The underlying real (factual) and hallucinated (counterfactual) visual items extracted during the MHI mining process. - **`yes_question_class`** / **`no_question_class`**: The specific cognitive task class for the question (e.g., `Sentiment`, `Color`, `Position`, `Object`). --- ## 🚀 How to Use You can easily load the question-answer pairs using the `datasets` library. ```python from datasets import load_dataset # Load the main dataset (replace 'your-username' with your actual HF username) dataset = load_dataset("Tong613/LIVE-multi-image-bench", data_dir="main") # Print the first evaluation sample print(dataset['train'][0])

许可证：MIT协议任务类别： - 视觉问答 - 图像到文本语言： - 英语标签： - 多图像 - 视觉幻觉 - 评估 - 大视觉语言模型（Large Vision-Language Models, LVLM） - 多模态规模类别： - 10K<n<100K # 📸 LIVE：大语言模型辅助多图像视觉幻觉评估基准 [![GitHub](https://img.shields.io/badge/GitHub-代码与工具-blue?logo=github)](https://github.com/Master-PLC/LIVE) 欢迎使用**LIVE**数据集！本基准旨在评估大视觉语言模型（Large Vision-Language Models, LVLM）中的多图像视觉幻觉现象。 ## 🌟 核心特性与传统单图像基准不同，LIVE系统性地解决了多图像理解的复杂挑战： - **两类典型场景**：评估两种语境下的幻觉模式：**统一图像语境（Uniform Image Contexts, UIC）**（内容混淆）与**多样化图像语境（Diverse Image Contexts, DIC）**（语境干扰）。 - **多粒度评估协议（Multi-granularity Assessment Protocol, MAP）**：不再依赖单一全局查询，而是通过不同数量的目标图像（1至4张）测算幻觉发生率。 - **覆盖全面的任务类型**：包含超过3.2万个判断题，涵盖6类视觉识别任务：*物体、材质、颜色、情感、动作与位置*。 ## 📂 数据集结构数据集包含488个多图像场景（242个UIC场景 + 246个DIC场景），搭配来自MS-COCO的日常图像。数据以JSON格式存储。 ### 数据格式示例以下为JSON文件的示例（例如`main_k4_questions_attributes.json`）： json { "task": "attributes", "type": "UIC", "qtype": "4", "image_id": [ "COCO_val2014_000000239985.jpg", "COCO_val2014_000000376628.jpg", "COCO_val2014_000000369763.jpg", "COCO_val2014_000000176793.jpg" ], "yes_question": "Is the lady smiling in image 4?", "no_question": "Is the lady frowning in image 4?", "ritem": "lady is smiling", "hitem": "lady is frowning", "yes_question_class": "Sentiment", "no_question_class": "Sentiment" } ### 🔑 关键字段每个JSON条目包含以下结构化信息： - **`task`**：视觉识别任务类别（例如`attributes`、`actions`、`relations`）。 - **`type`**：多图像场景类型，为**统一图像语境（UIC）**或**多样化图像语境（DIC）**。 - **`qtype`**：粒度等级，指示查询涉及的目标图像数量（取值范围为`1`至`4`）。 - **`image_id`**：当前场景所需的关联MS-COCO图像文件名列表。 - **`yes_question` / `no_question`**：均衡的二分类视觉问题。`yes_question`指向事实性内容，而`no_question`指向幻觉性（反事实）探针。 - **`ritem` / `hitem`**：在多图像幻觉（Multi-image Hallucination, MHI）挖掘过程中提取的真实（事实性）视觉项与幻觉性（反事实）视觉项。 - **`yes_question_class` / `no_question_class`**：问题对应的具体认知任务类别（例如`Sentiment`情感、`Color`颜色、`Position`位置、`Object`物体）。 --- ## 🚀 使用方法您可通过`datasets`库轻松加载问答对。 python from datasets import load_dataset # 加载主数据集（请将`your-username`替换为您实际的Hugging Face用户名） dataset = load_dataset("Tong613/LIVE-multi-image-bench", data_dir="main") # 打印首个评估样本 print(dataset['train'][0])

提供机构：

Tong613

5,000+

优质数据集

54 个

任务类型

进入经典数据集