five

Tong613/LIVE-multi-image-bench

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Tong613/LIVE-multi-image-bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - visual-question-answering - image-to-text language: - en tags: - multi-image - hallucination - evaluation - LVLM - multimodal size_categories: - 10K<n<100K --- # 📸 LIVE: An LLM-assisted Multi-Image Visual Hallucination Evaluation Benchmark [![GitHub](https://img.shields.io/badge/GitHub-Code_&_Tools-blue?logo=github)](https://github.com/Master-PLC/LIVE) Welcome to the **LIVE** dataset! This benchmark is designed to evaluate multi-image visual hallucinations in Large Vision-Language Models (LVLMs). ## 🌟 Key Features Unlike traditional single-image benchmarks, LIVE systematically addresses the complexities of multi-image understanding: - **Two Distinct Scenarios**: Evaluates context-dependent hallucination patterns under **Uniform Image Contexts (UIC)** (content confusion) and **Diverse Image Contexts (DIC)** (context interference). - **Multi-granularity Assessment Protocol (MAP)**: Measures hallucination rates across varying numbers of target images (1 to 4 images) rather than relying on a single overall query. - **Comprehensive Task Coverage**: Contains over 32K yes/no questions covering 6 visual recognition tasks: *Object, Material, Color, Sentiment, Action, and Position*. ## 📂 Dataset Structure The dataset contains 488 multi-image scenarios (242 UIC + 246 DIC) paired with daily-life images from MS-COCO. The data is stored in JSON format. ### Data Format Example Here is a sample from our JSON files (e.g., `main_k4_questions_attributes.json`): ```json { "task": "attributes", "type": "UIC", "qtype": "4", "image_id": [ "COCO_val2014_000000239985.jpg", "COCO_val2014_000000376628.jpg", "COCO_val2014_000000369763.jpg", "COCO_val2014_000000176793.jpg" ], "yes_question": "Is the lady smiling in image 4?", "no_question": "Is the lady frowning in image 4?", "ritem": "lady is smiling", "hitem": "lady is frowning", "yes_question_class": "Sentiment", "no_question_class": "Sentiment" } ``` ### 🔑 Key Fields Each JSON entry contains the following structured information: - **`task`**: The visual recognition task category (e.g., `attributes`, `actions`, `relations`). - **`type`**: The multi-image scenario type, either **`UIC`** (Uniform Image Context) or **`DIC`** (Diverse Image Context). - **`qtype`**: The granularity level, indicating the number of target images involved in the query (ranges from `1` to `4`). - **`image_id`**: A list of associated MS-COCO image filenames required for the scenario. - **`yes_question`** / **`no_question`**: The balanced binary visual questions. The `yes_question` targets factual content, while the `no_question` targets the hallucinated (counterfactual) probe. - **`ritem`** / **`hitem`**: The underlying real (factual) and hallucinated (counterfactual) visual items extracted during the MHI mining process. - **`yes_question_class`** / **`no_question_class`**: The specific cognitive task class for the question (e.g., `Sentiment`, `Color`, `Position`, `Object`). --- ## 🚀 How to Use You can easily load the question-answer pairs using the `datasets` library. ```python from datasets import load_dataset # Load the main dataset (replace 'your-username' with your actual HF username) dataset = load_dataset("Tong613/LIVE-multi-image-bench", data_dir="main") # Print the first evaluation sample print(dataset['train'][0])

许可证:MIT协议 任务类别: - 视觉问答 - 图像到文本 语言: - 英语 标签: - 多图像 - 视觉幻觉 - 评估 - 大视觉语言模型(Large Vision-Language Models, LVLM) - 多模态 规模类别: - 10K<n<100K # 📸 LIVE:大语言模型辅助多图像视觉幻觉评估基准 [![GitHub](https://img.shields.io/badge/GitHub-代码与工具-blue?logo=github)](https://github.com/Master-PLC/LIVE) 欢迎使用**LIVE**数据集!本基准旨在评估大视觉语言模型(Large Vision-Language Models, LVLM)中的多图像视觉幻觉现象。 ## 🌟 核心特性 与传统单图像基准不同,LIVE系统性地解决了多图像理解的复杂挑战: - **两类典型场景**:评估两种语境下的幻觉模式:**统一图像语境(Uniform Image Contexts, UIC)**(内容混淆)与**多样化图像语境(Diverse Image Contexts, DIC)**(语境干扰)。 - **多粒度评估协议(Multi-granularity Assessment Protocol, MAP)**:不再依赖单一全局查询,而是通过不同数量的目标图像(1至4张)测算幻觉发生率。 - **覆盖全面的任务类型**:包含超过3.2万个判断题,涵盖6类视觉识别任务:*物体、材质、颜色、情感、动作与位置*。 ## 📂 数据集结构 数据集包含488个多图像场景(242个UIC场景 + 246个DIC场景),搭配来自MS-COCO的日常图像。数据以JSON格式存储。 ### 数据格式示例 以下为JSON文件的示例(例如`main_k4_questions_attributes.json`): json { "task": "attributes", "type": "UIC", "qtype": "4", "image_id": [ "COCO_val2014_000000239985.jpg", "COCO_val2014_000000376628.jpg", "COCO_val2014_000000369763.jpg", "COCO_val2014_000000176793.jpg" ], "yes_question": "Is the lady smiling in image 4?", "no_question": "Is the lady frowning in image 4?", "ritem": "lady is smiling", "hitem": "lady is frowning", "yes_question_class": "Sentiment", "no_question_class": "Sentiment" } ### 🔑 关键字段 每个JSON条目包含以下结构化信息: - **`task`**:视觉识别任务类别(例如`attributes`、`actions`、`relations`)。 - **`type`**:多图像场景类型,为**统一图像语境(UIC)**或**多样化图像语境(DIC)**。 - **`qtype`**:粒度等级,指示查询涉及的目标图像数量(取值范围为`1`至`4`)。 - **`image_id`**:当前场景所需的关联MS-COCO图像文件名列表。 - **`yes_question` / `no_question`**:均衡的二分类视觉问题。`yes_question`指向事实性内容,而`no_question`指向幻觉性(反事实)探针。 - **`ritem` / `hitem`**:在多图像幻觉(Multi-image Hallucination, MHI)挖掘过程中提取的真实(事实性)视觉项与幻觉性(反事实)视觉项。 - **`yes_question_class` / `no_question_class`**:问题对应的具体认知任务类别(例如`Sentiment`情感、`Color`颜色、`Position`位置、`Object`物体)。 --- ## 🚀 使用方法 您可通过`datasets`库轻松加载问答对。 python from datasets import load_dataset # 加载主数据集(请将`your-username`替换为您实际的Hugging Face用户名) dataset = load_dataset("Tong613/LIVE-multi-image-bench", data_dir="main") # 打印首个评估样本 print(dataset['train'][0])
提供机构:
Tong613
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作