Tong613/LIVE-multi-image-bench
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Tong613/LIVE-multi-image-bench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- visual-question-answering
- image-to-text
language:
- en
tags:
- multi-image
- hallucination
- evaluation
- LVLM
- multimodal
size_categories:
- 10K<n<100K
---
# 📸 LIVE: An LLM-assisted Multi-Image Visual Hallucination Evaluation Benchmark
[](https://github.com/Master-PLC/LIVE)
Welcome to the **LIVE** dataset! This benchmark is designed to evaluate multi-image visual hallucinations in Large Vision-Language Models (LVLMs).
## 🌟 Key Features
Unlike traditional single-image benchmarks, LIVE systematically addresses the complexities of multi-image understanding:
- **Two Distinct Scenarios**: Evaluates context-dependent hallucination patterns under **Uniform Image Contexts (UIC)** (content confusion) and **Diverse Image Contexts (DIC)** (context interference).
- **Multi-granularity Assessment Protocol (MAP)**: Measures hallucination rates across varying numbers of target images (1 to 4 images) rather than relying on a single overall query.
- **Comprehensive Task Coverage**: Contains over 32K yes/no questions covering 6 visual recognition tasks: *Object, Material, Color, Sentiment, Action, and Position*.
## 📂 Dataset Structure
The dataset contains 488 multi-image scenarios (242 UIC + 246 DIC) paired with daily-life images from MS-COCO. The data is stored in JSON format.
### Data Format Example
Here is a sample from our JSON files (e.g., `main_k4_questions_attributes.json`):
```json
{
"task": "attributes",
"type": "UIC",
"qtype": "4",
"image_id": [
"COCO_val2014_000000239985.jpg",
"COCO_val2014_000000376628.jpg",
"COCO_val2014_000000369763.jpg",
"COCO_val2014_000000176793.jpg"
],
"yes_question": "Is the lady smiling in image 4?",
"no_question": "Is the lady frowning in image 4?",
"ritem": "lady is smiling",
"hitem": "lady is frowning",
"yes_question_class": "Sentiment",
"no_question_class": "Sentiment"
}
```
### 🔑 Key Fields
Each JSON entry contains the following structured information:
- **`task`**: The visual recognition task category (e.g., `attributes`, `actions`, `relations`).
- **`type`**: The multi-image scenario type, either **`UIC`** (Uniform Image Context) or **`DIC`** (Diverse Image Context).
- **`qtype`**: The granularity level, indicating the number of target images involved in the query (ranges from `1` to `4`).
- **`image_id`**: A list of associated MS-COCO image filenames required for the scenario.
- **`yes_question`** / **`no_question`**: The balanced binary visual questions. The `yes_question` targets factual content, while the `no_question` targets the hallucinated (counterfactual) probe.
- **`ritem`** / **`hitem`**: The underlying real (factual) and hallucinated (counterfactual) visual items extracted during the MHI mining process.
- **`yes_question_class`** / **`no_question_class`**: The specific cognitive task class for the question (e.g., `Sentiment`, `Color`, `Position`, `Object`).
---
## 🚀 How to Use
You can easily load the question-answer pairs using the `datasets` library.
```python
from datasets import load_dataset
# Load the main dataset (replace 'your-username' with your actual HF username)
dataset = load_dataset("Tong613/LIVE-multi-image-bench", data_dir="main")
# Print the first evaluation sample
print(dataset['train'][0])
许可证:MIT协议
任务类别:
- 视觉问答
- 图像到文本
语言:
- 英语
标签:
- 多图像
- 视觉幻觉
- 评估
- 大视觉语言模型(Large Vision-Language Models, LVLM)
- 多模态
规模类别:
- 10K<n<100K
# 📸 LIVE:大语言模型辅助多图像视觉幻觉评估基准
[](https://github.com/Master-PLC/LIVE)
欢迎使用**LIVE**数据集!本基准旨在评估大视觉语言模型(Large Vision-Language Models, LVLM)中的多图像视觉幻觉现象。
## 🌟 核心特性
与传统单图像基准不同,LIVE系统性地解决了多图像理解的复杂挑战:
- **两类典型场景**:评估两种语境下的幻觉模式:**统一图像语境(Uniform Image Contexts, UIC)**(内容混淆)与**多样化图像语境(Diverse Image Contexts, DIC)**(语境干扰)。
- **多粒度评估协议(Multi-granularity Assessment Protocol, MAP)**:不再依赖单一全局查询,而是通过不同数量的目标图像(1至4张)测算幻觉发生率。
- **覆盖全面的任务类型**:包含超过3.2万个判断题,涵盖6类视觉识别任务:*物体、材质、颜色、情感、动作与位置*。
## 📂 数据集结构
数据集包含488个多图像场景(242个UIC场景 + 246个DIC场景),搭配来自MS-COCO的日常图像。数据以JSON格式存储。
### 数据格式示例
以下为JSON文件的示例(例如`main_k4_questions_attributes.json`):
json
{
"task": "attributes",
"type": "UIC",
"qtype": "4",
"image_id": [
"COCO_val2014_000000239985.jpg",
"COCO_val2014_000000376628.jpg",
"COCO_val2014_000000369763.jpg",
"COCO_val2014_000000176793.jpg"
],
"yes_question": "Is the lady smiling in image 4?",
"no_question": "Is the lady frowning in image 4?",
"ritem": "lady is smiling",
"hitem": "lady is frowning",
"yes_question_class": "Sentiment",
"no_question_class": "Sentiment"
}
### 🔑 关键字段
每个JSON条目包含以下结构化信息:
- **`task`**:视觉识别任务类别(例如`attributes`、`actions`、`relations`)。
- **`type`**:多图像场景类型,为**统一图像语境(UIC)**或**多样化图像语境(DIC)**。
- **`qtype`**:粒度等级,指示查询涉及的目标图像数量(取值范围为`1`至`4`)。
- **`image_id`**:当前场景所需的关联MS-COCO图像文件名列表。
- **`yes_question` / `no_question`**:均衡的二分类视觉问题。`yes_question`指向事实性内容,而`no_question`指向幻觉性(反事实)探针。
- **`ritem` / `hitem`**:在多图像幻觉(Multi-image Hallucination, MHI)挖掘过程中提取的真实(事实性)视觉项与幻觉性(反事实)视觉项。
- **`yes_question_class` / `no_question_class`**:问题对应的具体认知任务类别(例如`Sentiment`情感、`Color`颜色、`Position`位置、`Object`物体)。
---
## 🚀 使用方法
您可通过`datasets`库轻松加载问答对。
python
from datasets import load_dataset
# 加载主数据集(请将`your-username`替换为您实际的Hugging Face用户名)
dataset = load_dataset("Tong613/LIVE-multi-image-bench", data_dir="main")
# 打印首个评估样本
print(dataset['train'][0])
提供机构:
Tong613



