five

REAL-MM-RAG_TechReport

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ibm-research/REAL-MM-RAG_TechReport
下载链接
链接失效反馈
官方服务:
资源简介:
<style> /* H1{color:Blue !important;} */ /* H1{color:DarkOrange !important;} H2{color:DarkOrange !important;} H3{color:DarkOrange !important;} */ /* p{color:Black !important;} */ </style> <!-- # REAL-MM-RAG-Bench We introduced REAL-MM-RAG-Bench, a real-world multi-modal retrieval benchmark designed to evaluate retrieval models in reliable, challenging, and realistic settings. The benchmark was constructed using an automated pipeline, where queries were generated by a vision-language model (VLM), filtered by a large language model (LLM), and rephrased by an LLM to ensure high-quality retrieval evaluation. To simulate real-world retrieval challenges, we introduce multi-level query rephrasing, modifying queries at three distinct levels—from minor wording adjustments to significant structural changes—ensuring models are tested on their true semantic understanding rather than simple keyword matching. ## REAL-MM-RAG_FinReport Financial reports (2005–2023), totaling 19 documents and 2687 pages, with a mix of text and tables. ## How to Load the Dataset ```python from datasets import load_dataset dataset = load_dataset("ibm-research/REAL-MM-RAG_FinReport") print(dataset) ``` ### Source Paper [REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark](https://arxiv.org/abs/2502.12342) --> # REAL-MM-RAG-Bench: A Real-World Multi-Modal Retrieval Benchmark We introduced REAL-MM-RAG-Bench, a real-world multi-modal retrieval benchmark designed to evaluate retrieval models in reliable, challenging, and realistic settings. The benchmark was constructed using an automated pipeline, where queries were generated by a vision-language model (VLM), filtered by a large language model (LLM), and rephrased by an LLM to ensure high-quality retrieval evaluation. To simulate real-world retrieval challenges, we introduce multi-level query rephrasing, modifying queries at three distinct levels—from minor wording adjustments to significant structural changes—ensuring models are tested on their true semantic understanding rather than simple keyword matching. ## **REAL-MM-RAG_TechReport** - **Content**: 17 technical documents on IBM FlashSystem. - **Size**: 1,674 pages. - **Composition**: Text-heavy with visual elements and structured tables. - **Purpose**: Assesses model performance in retrieving structured technical content. ## Loading the Dataset To use the dataset, install the ⁠ datasets ⁠ library and load it as follows: ```python from datasets import load_dataset # Load the dataset dataset = load_dataset("ibm-research/REAL-MM-RAG_TechReport", split="test") # Indexing queries to image filenames query_to_image = {ex['query']: ex['image_filename'] for ex in dataset if ex['query'] is not None} # Indexing image filenames to associated queries image_to_queries = {} for ex in dataset: image_to_queries.setdefault(ex['image_filename'], []).append(ex['query']) # Example 1: Find the image for a specific query query_example = "What does a Safeguarded backup policy control in IBM FlashSystem?" if query_example in query_to_image: image_filename = query_to_image[query_example] print(f"Query '{query_example}' is linked to image: {image_filename}") # Example 2: Find all queries linked to a specific image image_example = "IBM FlashSystem Safeguarded Copy Implementation Guide_page_36.png" if image_example in image_to_queries: linked_queries = image_to_queries[image_example] print(f"Image '{image_example}' is linked to queries: {linked_queries}") # Example 3: Handle cases where a page has no queries (only part of the dataset) image_example = "IBM Storage FlashSystem 7300 Product Guide Updated for IBM Storage Virtualize 8.7_page_20.png" if image_example in image_to_queries: linked_queries = image_to_queries[image_example] print(f"Image '{image_example}' is linked to queries: {linked_queries}") ```  ⁠ ## Source Paper ```bibtex @misc{wasserman2025realmmragrealworldmultimodalretrieval, title={REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark}, author={Navve Wasserman and Roi Pony and Oshri Naparstek and Adi Raz Goldfarb and Eli Schwartz and Udi Barzelay and Leonid Karlinsky}, year={2025}, eprint={2502.12342}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2502.12342}, } ``` <!-- # REAL-MM-RAG-Bench: A Real-World Multi-Modal Retrieval Benchmark ## Overview REAL-MM-RAG-Bench is a benchmark designed to evaluate multi-modal retrieval models under realistic and challenging conditions. This dataset provides multi-modal documents with diverse content, including text, tables, and figures, to test models' ability to retrieve relevant information based on natural language queries. ## Features - **Multi-Modal Documents**: Includes a mix of text, figures, and tables, ensuring a realistic document retrieval scenario. - **Long Document Focus**: Prioritizes long documents over isolated pages to reflect real-world retrieval challenges. - **Sub-Domain Consistency**: Ensures many pages belong to the same sub-domain by focusing on IBM data. - **Enhanced Difficulty**: Queries require more than keyword matching and are tested against a corpus with highly similar pages. - **Realistic Queries**: Queries are generated through an automated pipeline using a Vision-Language Model (VLM), refined by a Large Language Model (LLM) to mimic real-world search behavior. - **Accurate Labeling**: Ensures that all relevant documents for a query are correctly labeled to avoid false negatives. - **Multi-Level Query Rephrasing**: Queries undergo multiple levels of rewording to evaluate model robustness beyond simple lexical matching. ## Dataset Subsets ### **REAL-MM-RAG_FinReport** - **Content**: 19 financial reports from 2005–2023. - **Size**: 2,687 pages. - **Composition**: Includes both textual data and structured tables. - **Purpose**: Designed to test model performance on table-heavy financial data retrieval. --> <!-- ### **REAL-MM-RAG_FinSlides** - **Content**: 65 quarterly financial presentations from 2008–2024. - **Size**: 2,280 pages. - **Composition**: Primarily table-heavy with key financial insights. - **Purpose**: Evaluates retrieval in visually structured financial presentations. ### **REAL-MM-RAG_TechReport** - **Content**: 17 technical documents on IBM FlashSystem. - **Size**: 1,674 pages. - **Composition**: Text-heavy with visual elements and structured tables. - **Purpose**: Assesses model performance in retrieving structured technical content. ### **REAL-MM-RAG_TechSlides** - **Content**: 62 technical presentations on business and IT automation. - **Size**: 1,963 pages. - **Composition**: Mix of text, visuals, and tables. - **Purpose**: Evaluates retrieval of IT automation and business insights from slide decks. --> <!-- ## Loading the Dataset To use the dataset, install the `datasets` library and load it as follows: ```python from datasets import load_dataset dataset = load_dataset("ibm-research/REAL-MM-RAG_FinReport") print(dataset) ``` ## Dataset Construction ### **Automated Query Generation Pipeline** - Queries are generated using a **VLM**, ensuring diverse and realistic question formats. - The **LLM filters** low-quality queries and rephrases them into user-friendly search queries. - Multi-level **query rephrasing** introduces increasing levels of variation to assess semantic retrieval performance. ### **Document Categories** - **Financial Reports**: Annual and quarterly reports with a high concentration of tabular data. - **Technical Documents**: Product manuals and whitepapers from IBM, focusing on structured information. - **Presentation Slides**: Corporate presentations with a mix of visuals and key financial data. <!-- ### **Evaluation Criteria** Models are evaluated based on: - **Retrieval accuracy**: Measured using metrics like **NDCG@5** and **Recall@1**. - **Rephrasing robustness**: Performance drop across increasing levels of query modification. - **Table comprehension**: Success rate in retrieving relevant tabular data. -->

# REAL-MM-RAG-Bench:真实世界多模态检索基准 我们推出了REAL-MM-RAG-Bench,这是一款面向真实场景的多模态检索基准数据集,旨在于可靠、严苛且贴合实际的环境中评估检索模型的性能。该基准数据集通过自动化流水线构建:查询由视觉语言模型(Vision-Language Model, VLM)生成,经大语言模型(Large Language Model, LLM)筛选后,再由大语言模型(LLM)进行重述,以保障检索评估的高质量。为模拟真实世界的检索挑战,我们引入多级查询重述机制,在三个不同层级对查询进行修改——从细微的措辞调整到大幅的结构变更,确保模型的评估基于其真正的语义理解能力,而非简单的关键词匹配。 ## REAL-MM-RAG_FinReport - **内容**:2005年至2023年的财务报告,共计19份文档、2687页,包含文本与表格混合内容。 ## 数据集加载方法 如需使用该数据集,请先安装` datasets `库,随后按如下方式加载: python from datasets import load_dataset dataset = load_dataset("ibm-research/REAL-MM-RAG_FinReport") print(dataset) ## 来源论文 [REAL-MM-RAG:真实世界多模态检索基准](https://arxiv.org/abs/2502.12342) --- # REAL-MM-RAG-Bench:真实世界多模态检索基准 我们推出了REAL-MM-RAG-Bench,这是一款面向真实场景的多模态检索基准数据集,旨在于可靠、严苛且贴合实际的环境中评估检索模型的性能。该基准数据集通过自动化流水线构建:查询由视觉语言模型(Vision-Language Model, VLM)生成,经大语言模型(Large Language Model, LLM)筛选后,再由大语言模型(LLM)进行重述,以保障检索评估的高质量。为模拟真实世界的检索挑战,我们引入多级查询重述机制,在三个不同层级对查询进行修改——从细微的措辞调整到大幅的结构变更,确保模型的评估基于其真正的语义理解能力,而非简单的关键词匹配。 ## REAL-MM-RAG_TechReport - **内容**:17份关于IBM FlashSystem的技术文档 - **规模**:共计1674页 - **构成**:以文本为主体,辅以可视化元素与结构化表格 - **用途**:用于评估模型检索结构化技术内容的性能 ## 数据集加载方法 如需使用该数据集,请先安装`datasets`库,随后按如下方式加载: python from datasets import load_dataset # 加载数据集 dataset = load_dataset("ibm-research/REAL-MM-RAG_TechReport", split="test") # 建立查询到图像文件名的映射 query_to_image = {ex['query']: ex['image_filename'] for ex in dataset if ex['query'] is not None} # 建立图像文件名到关联查询的映射 image_to_queries = {} for ex in dataset: image_to_queries.setdefault(ex['image_filename'], []).append(ex['query']) # 示例1:根据特定查询查找关联图像 query_example = "IBM FlashSystem中的安全备份策略具体管控哪些内容?" if query_example in query_to_image: image_filename = query_to_image[query_example] print(f"查询'{query_example}'关联的图像为:{image_filename}") # 示例2:根据特定图像查找所有关联查询 image_example = "IBM FlashSystem Safeguarded Copy Implementation Guide_page_36.png" if image_example in image_to_queries: linked_queries = image_to_queries[image_example] print(f"图像'{image_example}'关联的查询为:{linked_queries}") # 示例3:处理无关联查询的页面(仅数据集的一部分) image_example = "IBM Storage FlashSystem 7300 Product Guide Updated for IBM Storage Virtualize 8.7_page_20.png" if image_example in image_to_queries: linked_queries = image_to_queries[image_example] print(f"图像'{image_example}'关联的查询为:{linked_queries}") ## 来源论文 bibtex @misc{wasserman2025realmmragrealworldmultimodalretrieval, title={"REAL-MM-RAG:真实世界多模态检索基准"}, author={Navve Wasserman and Roi Pony and Oshri Naparstek and Adi Raz Goldfarb and Eli Schwartz and Udi Barzelay and Leonid Karlinsky}, year={2025}, eprint={2502.12342}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2502.12342}, }
提供机构:
maas
创建时间:
2025-10-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作