MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images

Name: MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images
Creator: PhysioNet
Published: 2024-07-19 18:44:56
License: 暂无描述

DataCite Commons2024-07-19 更新2025-04-16 收录

下载链接：

https://physionet.org/content/mimic-ext-mimic-cxr-vqa/

下载链接

链接失效反馈

官方服务：

资源简介：

We introduce MIMIC-Ext- _MIMIC-CXR-VQA_ (i.e., Extended from MIMIC database), a complex, diverse, and large-scale dataset designed for Visual Question Answering (VQA) tasks within the medical domain, focusing primarily on chest radiographs. This dataset includes approximately 377K entries derived from the MIMIC-CXR-JPG, MIMIC-IV, and Chest ImaGenome datasets, all sourced from Physionet. It features questions generated from 48 unique templates across seven content types: presence, anatomy, attribute, abnormality, size, plane, and gender. Each template, developed under the guidance of a board-certified medical expert to ensure clinical relevance, addresses both standard content from previous medical VQA tasks and more complex scenarios involving set and logical operations. To further enhance linguistic diversity while maintaining a medical context, we implemented a paraphrasing strategy with an average of 16.5 paraphrases per template, developed through carefully designed prompts based on GPT-4. The primary aim of MIMIC-Ext- _MIMIC-CXR-VQA_ is to serve as a comprehensive benchmark for evaluating medical VQA methodologies. However, the significance of this dataset extends far beyond just medical VQA benchmarking. It not only provides a foundational tool for developing and testing VQA methods but also acts as a valuable resource for instruction tuning of medical Vision-and- Language Models (VLMs), addressing the scarcity of medical instruction datasets. Furthermore, the integration of structured EHRs (i.e., MIMIC-IV) with our dataset, MIMIC-Ext- _MIMIC-CXR-VQA_ , opens new avenues for the development of multi-modal AI frameworks that leverage both imaging and tabular modalities of patient records. By making this dataset publicly accessible, we aim to improve the understanding of medical images and stimulate further innovation within the realm of medical AI.

我们提出了MIMIC-Ext-MIMIC-CXR-VQA（即基于MIMIC数据库拓展而来的数据集），这是一款面向医学领域视觉问答（Visual Question Answering, VQA）任务的复杂、多样且大规模数据集，核心聚焦于胸部X光影像。该数据集共包含约37.7万个样本条目，其数据来源于MIMIC-CXR-JPG、MIMIC-IV以及Chest ImaGenome三个公开数据集，所有数据均取自Physionet平台。该数据集涵盖基于48种独立模板生成的问答对，覆盖7大类内容类型：存在性、解剖结构、属性、异常、尺寸、成像平面与性别。每一种问答模板均在经执业认证的医学专家（board-certified medical expert）指导下开发，以确保其临床相关性；其内容既包含既往医学VQA任务中的标准问题类型，也涵盖涉及集合与逻辑运算的复杂场景。为在保留医学语境的前提下进一步提升语言多样性，我们基于GPT-4设计了精心构建的提示词，采用释义改写策略，平均每个模板可生成16.5种不同的问题表述。 MIMIC-Ext-MIMIC-CXR-VQA的核心目标是作为一款全面的基准数据集，用于评估医学VQA相关方法。但该数据集的应用价值远不止于医学VQA基准测试：其不仅可为VQA方法的开发与测试提供基础工具，还可作为医学视觉语言模型（Vision-and-Language Models, VLMs）指令微调的宝贵资源，填补了医学领域指令数据集稀缺的空白。此外，将结构化电子健康记录（Electronic Health Records, EHRs，即MIMIC-IV）与本数据集相结合，为开发同时利用患者影像记录与表格记录的多模态人工智能框架开辟了全新路径。我们将该数据集公开共享，旨在推动学界对医学影像的理解，并促进医学人工智能领域的进一步创新。

提供机构：

PhysioNet

创建时间：

2024-07-11

搜集汇总

数据集介绍

背景与挑战

背景概述

MIMIC-Ext-MIMIC-CXR-VQA is a comprehensive and diverse dataset for medical visual question answering, containing 377,391 entries with questions generated from 48 clinically validated templates. It serves as a benchmark for medical VQA and supports the development of multi-modal AI systems by integrating chest X-ray images with structured EHR data.

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集