five

Chest ImaGenome Dataset

收藏
physionet.org2025-01-21 收录
下载链接:
https://physionet.org/content/chest-imagenome/1.0.0/
下载链接
链接失效反馈
官方服务:
资源简介:
In recent years, with the release of multiple large datasets, automatic interpretation of chest X-ray (CXR) images with deep learning models have become feasible for specific abnormalities or for generating preliminary reports. However, despite reports of performance reaching similar levels to that of radiologists, a quantitative evaluation of the explainability of these models is hampered by the lack of locally labeled datasets for different findings. With the exception of a few human-labeled small-scale datasets for specific findings, such as pneumonia and pneumothorax, most of the CXR deep learning models to date are trained on global "weak" labels extracted from text reports, or trained via a joint image and unstructured text learning strategy. In our work, a joint rule-based natural language processing (NLP) and CXR atlas-based bounding box detection pipeline are used to automatically label 242072 frontal MIMIC CXRs locally. Inspired by the Visual Genome effort in the computer vision community [20], we constructed the first Chest ImaGenome dataset with a scene graph data structure to describe the data. Through a radiologist constructed CXR ontology, the annotations for each CXR are connected as an anatomy-centered scene graph, useful for image-level reasoning and multimodal fusion applications. Overall, our dataset contributes significantly to the research community by providing 1) 1,256 combinations of relation annotations between 29 CXR anatomical locations (objects with bounding box coordinates) and their attributes, structured as a scene graph per image, 2) over 670,000 localized comparison relations (for improved, worsened, or no change) between the anatomical locations across sequential exams, as well as 3) a manually annotated gold standard scene graph dataset from 500 unique patients.

近年来,伴随着多个大型数据集的发布,利用深度学习模型对胸部X射线(CXR)图像进行自动解读,已实现针对特定异常或生成初步报告的目标。然而,尽管已有报道指出这些模型的性能已达到与放射科医生相当的水平,但由于缺乏针对不同发现结果的局部标注数据集,对这些模型可解释性的定量评估受到阻碍。除了少数针对特定发现(如肺炎和气胸)的人标注小规模数据集外,迄今为止,大多数CXR深度学习模型均是在从文本报告中提取的“弱”全局标签上进行训练,或者通过联合图像和无结构文本学习策略进行训练。在我们的研究中,我们采用了一种基于规则的联合自然语言处理(NLP)和CXR图谱的边界框检测流程,以自动对242072张MIMIC正面CXR进行局部标注。受计算机视觉社区中视觉基因组(Visual Genome)努力的启发,我们构建了首个以场景图数据结构描述的胸部影像基因组(Chest ImaGenome)数据集。通过放射科医生构建的CXR本体,将每个CXR的标注连接为一个以解剖为中心的场景图,这对于图像级推理和多模态融合应用非常有用。总体而言,我们的数据集通过以下三个方面对研究界做出了重大贡献:1)为29个CXR解剖位置(具有边界框坐标的对象)及其属性之间构建了1,256种关系标注的组合,每个图像以场景图的形式结构化;2)在连续检查中,对解剖位置之间的超过670,000个局部比较关系(改善、恶化或无变化)进行了标注;3)从500位独特患者中手动标注的黄金标准场景图数据集。
提供机构:
physionet.org
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Chest ImaGenome Dataset是一个针对胸部X光片(CXR)的大规模场景图数据集,发布于2021年,包含242,072个正面CXR图像的自动注释。该数据集通过自然语言处理和图像分割技术,将放射学报告中的知识结构化为解剖中心的场景图,提供29个解剖位置与其属性之间的1,256种关系注释,以及超过670,000个局部比较关系,用于描述连续检查之间的变化。其特点是首次在CXR领域引入类似Visual Genome的图结构,旨在支持可解释性评估和多模态融合研究,并附带一个手动注释的黄金标准数据集用于质量验证。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作