Chest X-ray dataset for lung segmentation

Mendeley Data2024-03-27 更新2024-06-26 收录

下载链接：

https://data.mendeley.com/datasets/8gf9vpkhgy

下载链接

链接失效反馈

官方服务：

资源简介：

The proposed dataset has been combined from three popular lung segmentation datasets: Darwin, Montgomery, and Shenzhen. The combined data allow researchers and clinicians to gain access to a good quality dataset, a large proportion of which has been manually annotated. The combined dataset consists of 6,810 images, with corresponding binary masks of lungs with the following distribution of images between the three datasets: • 6,106 images from the Darwin dataset; • 139 images from the Montgomery dataset; • 566 images from the Shenzhen dataset. The Darwin dataset [1, 2] images include most of the heart, revealing lung opacities behind the heart, which may be relevant for assessing the severity of viral pneumonia. The lower-most part of the lungs, where visible, is defined by the extent of the diaphragm. Where present and not obstructive to the distinguishability of the lungs, the diaphragm is included up until the lower-most visible part of the lungs. A key property of this dataset is that image resolutions, sources, and orientations vary across the dataset, with the smallest image being 156x156 pixels and the largest being 5600x4700 pixels. Furthermore, we included the portable X-ray images which are of significantly lower quality as compared to standard X-rays. A key limitation of the Darwin dataset is that it does not contain lateral X-ray lung segmentations. It is worth noting that lung segmentations were performed by human annotators using Darwin's Auto-Annotate AI and then adjusted and reviewed by expert radiologists. Both the Montgomery and Shenzhen datasets [3] were published by the United States National Library of Medicine and are made of posteroanterior chest X-ray images. These images are available to foster research in computer-aided diagnosis of pulmonary diseases with a special focus on pulmonary tuberculosis. The datasets were acquired from the Department of Health and Human Services (Maryland, USA) and Shenzhen №3 People's Hospital (Shenzhen, China). Both datasets contain normal and abnormal chest X-ray images with manifestations of tuberculosis and include associated radiologist readings. References: 1. Darwin’s Auto-Annotate AI. Available: https://www.v7labs.com/automated-annotation 2. COVID-19 X-ray dataset. Available: https://github.com/v7labs/covid-19-xray-dataset 3. Jaeger S, Candemir S, Antani S, Wáng Y-XJ, Lu P-X, Thoma G. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg. 2014;4: 475–477. doi:10.3978/j.issn.2223-4292.2014.11.20

本研究提出的数据集整合了三个主流肺部分割数据集：达尔文（Darwin）、蒙哥马利（Montgomery）及深圳（Shenzhen）数据集。整合后的数据集可为研究人员与临床医师提供高质量的标注数据集，其中绝大多数样本均经过人工标注。本整合数据集共包含6810张图像，附带对应的肺部二值掩码，三类数据集的图像分布如下： • 达尔文（Darwin）数据集：6106张图像； • 蒙哥马利（Montgomery）数据集：139张图像； • 深圳（Shenzhen）数据集：566张图像。达尔文（Darwin）数据集[1, 2]的图像包含大部分心脏区域，可显示心脏后方的肺部浑浊影，该特征对评估病毒性肺炎的严重程度具有参考价值。肺部可视范围内的最下缘以膈肌的延伸范围为界；若膈肌可视且不影响肺部辨识度，则保留至肺部最下缘可见处。本数据集的关键特性之一是图像分辨率、来源与方向存在显著差异，最小图像尺寸为156×156像素，最大可达5600×4700像素。此外，本数据集纳入了相较于标准X光片质量显著更低的便携式X光图像。达尔文（Darwin）数据集的一项关键局限是未包含侧位X光肺部分割标注。值得说明的是，本数据集的肺部分割标注由标注人员借助达尔文的自动标注AI（Auto-Annotate AI）完成，随后经放射科专家审核与调整。蒙哥马利（Montgomery）与深圳（Shenzhen）数据集[3]均由美国国家医学图书馆发布，均为后前位胸部X光图像。两类数据集旨在推动肺部疾病计算机辅助诊断领域的研究，尤其聚焦于肺结核的计算机辅助筛查。上述数据集采集自美国卫生与公众服务部（美国马里兰州）及中国深圳市第三人民医院。两类数据集均包含伴有结核表现的正常与异常胸部X光图像，并附带放射科医师的读片报告。参考文献： 1. Darwin’s Auto-Annotate AI. Available: https://www.v7labs.com/automated-annotation 2. COVID-19 X-ray dataset. Available: https://github.com/v7labs/covid-19-xray-dataset 3. Jaeger S, Candemir S, Antani S, Wáng Y-XJ, Lu P-X, Thoma G. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg. 2014;4: 475–477. doi:10.3978/j.issn.2223-4292.2014.11.20

创建时间：

2024-01-23

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个用于肺部分割的胸部X射线图像数据集，包含6,810张图像及其对应的二值掩膜，图像分辨率和来源多样，部分图像经过专家放射科医生调整和审查。数据集由三个子数据集（Darwin、Montgomery和Shenzhen）组合而成，适用于计算机辅助诊断肺部疾病的研究。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集