REAL-MM-RAG_TechReport
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ibm-research/REAL-MM-RAG_TechReport
下载链接
链接失效反馈官方服务:
资源简介:
<style>
/* H1{color:Blue !important;} */
/* H1{color:DarkOrange !important;}
H2{color:DarkOrange !important;}
H3{color:DarkOrange !important;} */
/* p{color:Black !important;} */
</style>
<!-- # REAL-MM-RAG-Bench
We introduced REAL-MM-RAG-Bench, a real-world multi-modal retrieval benchmark designed to evaluate retrieval models in reliable, challenging, and realistic settings. The benchmark was constructed using an automated pipeline, where queries were generated by a vision-language model (VLM), filtered by a large language model (LLM), and rephrased by an LLM to ensure high-quality retrieval evaluation. To simulate real-world retrieval challenges, we introduce multi-level query rephrasing, modifying queries at three distinct levels—from minor wording adjustments to significant structural changes—ensuring models are tested on their true semantic understanding rather than simple keyword matching.
## REAL-MM-RAG_FinReport
Financial reports (2005–2023), totaling 19 documents and 2687 pages, with a mix of text and tables.
## How to Load the Dataset
```python
from datasets import load_dataset
dataset = load_dataset("ibm-research/REAL-MM-RAG_FinReport")
print(dataset)
```
### Source Paper
[REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark](https://arxiv.org/abs/2502.12342) -->
# REAL-MM-RAG-Bench: A Real-World Multi-Modal Retrieval Benchmark
We introduced REAL-MM-RAG-Bench, a real-world multi-modal retrieval benchmark designed to evaluate retrieval models in reliable, challenging, and realistic settings. The benchmark was constructed using an automated pipeline, where queries were generated by a vision-language model (VLM), filtered by a large language model (LLM), and rephrased by an LLM to ensure high-quality retrieval evaluation. To simulate real-world retrieval challenges, we introduce multi-level query rephrasing, modifying queries at three distinct levels—from minor wording adjustments to significant structural changes—ensuring models are tested on their true semantic understanding rather than simple keyword matching.
## **REAL-MM-RAG_TechReport**
- **Content**: 17 technical documents on IBM FlashSystem.
- **Size**: 1,674 pages.
- **Composition**: Text-heavy with visual elements and structured tables.
- **Purpose**: Assesses model performance in retrieving structured technical content.
## Loading the Dataset
To use the dataset, install the datasets library and load it as follows:
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("ibm-research/REAL-MM-RAG_TechReport", split="test")
# Indexing queries to image filenames
query_to_image = {ex['query']: ex['image_filename'] for ex in dataset if ex['query'] is not None}
# Indexing image filenames to associated queries
image_to_queries = {}
for ex in dataset:
image_to_queries.setdefault(ex['image_filename'], []).append(ex['query'])
# Example 1: Find the image for a specific query
query_example = "What does a Safeguarded backup policy control in IBM FlashSystem?"
if query_example in query_to_image:
image_filename = query_to_image[query_example]
print(f"Query '{query_example}' is linked to image: {image_filename}")
# Example 2: Find all queries linked to a specific image
image_example = "IBM FlashSystem Safeguarded Copy Implementation Guide_page_36.png"
if image_example in image_to_queries:
linked_queries = image_to_queries[image_example]
print(f"Image '{image_example}' is linked to queries: {linked_queries}")
# Example 3: Handle cases where a page has no queries (only part of the dataset)
image_example = "IBM Storage FlashSystem 7300 Product Guide Updated for IBM Storage Virtualize 8.7_page_20.png"
if image_example in image_to_queries:
linked_queries = image_to_queries[image_example]
print(f"Image '{image_example}' is linked to queries: {linked_queries}")
```
## Source Paper
```bibtex
@misc{wasserman2025realmmragrealworldmultimodalretrieval,
title={REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark},
author={Navve Wasserman and Roi Pony and Oshri Naparstek and Adi Raz Goldfarb and Eli Schwartz and Udi Barzelay and Leonid Karlinsky},
year={2025},
eprint={2502.12342},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2502.12342},
}
```
<!--
# REAL-MM-RAG-Bench: A Real-World Multi-Modal Retrieval Benchmark
## Overview
REAL-MM-RAG-Bench is a benchmark designed to evaluate multi-modal retrieval models under realistic and challenging conditions. This dataset provides multi-modal documents with diverse content, including text, tables, and figures, to test models' ability to retrieve relevant information based on natural language queries.
## Features
- **Multi-Modal Documents**: Includes a mix of text, figures, and tables, ensuring a realistic document retrieval scenario.
- **Long Document Focus**: Prioritizes long documents over isolated pages to reflect real-world retrieval challenges.
- **Sub-Domain Consistency**: Ensures many pages belong to the same sub-domain by focusing on IBM data.
- **Enhanced Difficulty**: Queries require more than keyword matching and are tested against a corpus with highly similar pages.
- **Realistic Queries**: Queries are generated through an automated pipeline using a Vision-Language Model (VLM), refined by a Large Language Model (LLM) to mimic real-world search behavior.
- **Accurate Labeling**: Ensures that all relevant documents for a query are correctly labeled to avoid false negatives.
- **Multi-Level Query Rephrasing**: Queries undergo multiple levels of rewording to evaluate model robustness beyond simple lexical matching.
## Dataset Subsets
### **REAL-MM-RAG_FinReport**
- **Content**: 19 financial reports from 2005–2023.
- **Size**: 2,687 pages.
- **Composition**: Includes both textual data and structured tables.
- **Purpose**: Designed to test model performance on table-heavy financial data retrieval.
-->
<!-- ### **REAL-MM-RAG_FinSlides**
- **Content**: 65 quarterly financial presentations from 2008–2024.
- **Size**: 2,280 pages.
- **Composition**: Primarily table-heavy with key financial insights.
- **Purpose**: Evaluates retrieval in visually structured financial presentations.
### **REAL-MM-RAG_TechReport**
- **Content**: 17 technical documents on IBM FlashSystem.
- **Size**: 1,674 pages.
- **Composition**: Text-heavy with visual elements and structured tables.
- **Purpose**: Assesses model performance in retrieving structured technical content.
### **REAL-MM-RAG_TechSlides**
- **Content**: 62 technical presentations on business and IT automation.
- **Size**: 1,963 pages.
- **Composition**: Mix of text, visuals, and tables.
- **Purpose**: Evaluates retrieval of IT automation and business insights from slide decks. -->
<!-- ## Loading the Dataset
To use the dataset, install the `datasets` library and load it as follows:
```python
from datasets import load_dataset
dataset = load_dataset("ibm-research/REAL-MM-RAG_FinReport")
print(dataset)
```
## Dataset Construction
### **Automated Query Generation Pipeline**
- Queries are generated using a **VLM**, ensuring diverse and realistic question formats.
- The **LLM filters** low-quality queries and rephrases them into user-friendly search queries.
- Multi-level **query rephrasing** introduces increasing levels of variation to assess semantic retrieval performance.
### **Document Categories**
- **Financial Reports**: Annual and quarterly reports with a high concentration of tabular data.
- **Technical Documents**: Product manuals and whitepapers from IBM, focusing on structured information.
- **Presentation Slides**: Corporate presentations with a mix of visuals and key financial data.
<!-- ### **Evaluation Criteria**
Models are evaluated based on:
- **Retrieval accuracy**: Measured using metrics like **NDCG@5** and **Recall@1**.
- **Rephrasing robustness**: Performance drop across increasing levels of query modification.
- **Table comprehension**: Success rate in retrieving relevant tabular data. -->
# REAL-MM-RAG-Bench:真实世界多模态检索基准
我们推出了REAL-MM-RAG-Bench,这是一款面向真实场景的多模态检索基准数据集,旨在于可靠、严苛且贴合实际的环境中评估检索模型的性能。该基准数据集通过自动化流水线构建:查询由视觉语言模型(Vision-Language Model, VLM)生成,经大语言模型(Large Language Model, LLM)筛选后,再由大语言模型(LLM)进行重述,以保障检索评估的高质量。为模拟真实世界的检索挑战,我们引入多级查询重述机制,在三个不同层级对查询进行修改——从细微的措辞调整到大幅的结构变更,确保模型的评估基于其真正的语义理解能力,而非简单的关键词匹配。
## REAL-MM-RAG_FinReport
- **内容**:2005年至2023年的财务报告,共计19份文档、2687页,包含文本与表格混合内容。
## 数据集加载方法
如需使用该数据集,请先安装` datasets `库,随后按如下方式加载:
python
from datasets import load_dataset
dataset = load_dataset("ibm-research/REAL-MM-RAG_FinReport")
print(dataset)
## 来源论文
[REAL-MM-RAG:真实世界多模态检索基准](https://arxiv.org/abs/2502.12342)
---
# REAL-MM-RAG-Bench:真实世界多模态检索基准
我们推出了REAL-MM-RAG-Bench,这是一款面向真实场景的多模态检索基准数据集,旨在于可靠、严苛且贴合实际的环境中评估检索模型的性能。该基准数据集通过自动化流水线构建:查询由视觉语言模型(Vision-Language Model, VLM)生成,经大语言模型(Large Language Model, LLM)筛选后,再由大语言模型(LLM)进行重述,以保障检索评估的高质量。为模拟真实世界的检索挑战,我们引入多级查询重述机制,在三个不同层级对查询进行修改——从细微的措辞调整到大幅的结构变更,确保模型的评估基于其真正的语义理解能力,而非简单的关键词匹配。
## REAL-MM-RAG_TechReport
- **内容**:17份关于IBM FlashSystem的技术文档
- **规模**:共计1674页
- **构成**:以文本为主体,辅以可视化元素与结构化表格
- **用途**:用于评估模型检索结构化技术内容的性能
## 数据集加载方法
如需使用该数据集,请先安装`datasets`库,随后按如下方式加载:
python
from datasets import load_dataset
# 加载数据集
dataset = load_dataset("ibm-research/REAL-MM-RAG_TechReport", split="test")
# 建立查询到图像文件名的映射
query_to_image = {ex['query']: ex['image_filename'] for ex in dataset if ex['query'] is not None}
# 建立图像文件名到关联查询的映射
image_to_queries = {}
for ex in dataset:
image_to_queries.setdefault(ex['image_filename'], []).append(ex['query'])
# 示例1:根据特定查询查找关联图像
query_example = "IBM FlashSystem中的安全备份策略具体管控哪些内容?"
if query_example in query_to_image:
image_filename = query_to_image[query_example]
print(f"查询'{query_example}'关联的图像为:{image_filename}")
# 示例2:根据特定图像查找所有关联查询
image_example = "IBM FlashSystem Safeguarded Copy Implementation Guide_page_36.png"
if image_example in image_to_queries:
linked_queries = image_to_queries[image_example]
print(f"图像'{image_example}'关联的查询为:{linked_queries}")
# 示例3:处理无关联查询的页面(仅数据集的一部分)
image_example = "IBM Storage FlashSystem 7300 Product Guide Updated for IBM Storage Virtualize 8.7_page_20.png"
if image_example in image_to_queries:
linked_queries = image_to_queries[image_example]
print(f"图像'{image_example}'关联的查询为:{linked_queries}")
## 来源论文
bibtex
@misc{wasserman2025realmmragrealworldmultimodalretrieval,
title={"REAL-MM-RAG:真实世界多模态检索基准"},
author={Navve Wasserman and Roi Pony and Oshri Naparstek and Adi Raz Goldfarb and Eli Schwartz and Udi Barzelay and Leonid Karlinsky},
year={2025},
eprint={2502.12342},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2502.12342},
}
提供机构:
maas
创建时间:
2025-10-12



