Mammogram Density Assessment Dataset
收藏doi.org2025-03-27 收录
下载链接:
http://doi.org/10.17632/tdx3h2fn9v.5
下载链接
链接失效反馈官方服务:
资源简介:
GENERAL OVERVIEW
This dataset was compiled to address the limitations of current methods for breast density assessment in mammograms, especially the challenges of:
• Shortage of radiologists: There are not enough radiologists to efficiently analyze the large number of mammograms needed for screening.
• Subjectivity: Radiologist assessments of breast density can vary, leading to inconsistencies.
• Limitations of existing tools: Current CAD tools for breast density estimation often have limitations, such as restricted functionality to specific mammogram views and difficulties with accurate segmentation.
This dataset offers a unique solution by expanding the original mammogram images with:
• Binary masks of the breast area: These expert-annotated masks precisely delineate the entire breast region in each mammogram, providing valuable ground truth data for segmentation methods.
• Binary masks of dense tissue: Similarly, these masks accurately identify areas of dense tissue within each mammogram, further enhancing the dataset's utility for training and evaluating segmentation algorithms.
The dataset facilitates the development of automated breast density estimation with deep learning. It also serves as a valuable tool for researchers developing and benchmarking medical image segmentation methods specifically focused on breast tissue analysis in mammograms.
DATA DESCRIPTION
This dataset consists of segmentation masks for dense tissue and breast area as well as area-based breast density percentage values from the VinDr-Mammo public dataset accessible from [3]. All annotations were performed and validated by an expert radiologist.
Files:
The data is provided in two compressed archives, ‘train.zip’ and ‘test.zip’.
• train.zip: This archive contains two sub-folders:
- breast_masks: This sub-folder contains the ground truth segmentation masks for the breast area, also in JPG format.
- dense_masks: This sub-folder contains the ground truth segmentation masks for the dense tissue, again in JPG format.
The segmentation masks have the dimensions of 2800×3518 pixels.
File Lists:
Two CSV files are provided alongside the compressed archives:
• train.csv: This file contains information about the training set with two columns:
- Filename: This column contains the filenames of the training set images. These images can be directly downloaded from the VinDr-Mammo dataset, https://physionet.org/content/vindr-mammo/1.0.0/.
- Density: This column provides the ground truth continuous breast density value for each mammogram in the training set, intended for the breast density estimation task.
• test.csv: This file contains a single column, “Filename”, listing the filenames of the test set. No ground truth information is provided for the test set. Ground truths are intentionally kept private for Breast Density Kaggle Challenge https://www.kaggle.com/competitions/breast-density-prediction, however, will be eventually open to public in the dataset repository.
总体概述
本数据集旨在解决当前乳腺密度评估方法在乳腺X光片中存在的局限性,尤其是以下挑战:
• 放射科医师短缺:现有放射科医师数量不足以高效分析大量用于筛查的乳腺X光片。
• 主观性:放射科医师对乳腺密度的评估可能存在差异,导致结果不一致。
• 现有工具的局限性:当前用于乳腺密度估计的计算机辅助诊断(CAD)工具通常存在局限性,例如功能仅限于特定的乳腺X光片视图,以及准确分割的困难。
本数据集通过以下方式提供独特的解决方案,以扩展原始的乳腺X光片图像:
• 乳腺区域二值掩膜:这些由专家标注的掩膜精确地描绘了每张乳腺X光片中的整个乳腺区域,为分割方法提供了宝贵的真实数据。
• 密集组织二值掩膜:同样,这些掩膜准确地识别了每张乳腺X光片中的密集组织区域,进一步增强了数据集在训练和评估分割算法方面的实用性。
该数据集促进了基于深度学习的自动乳腺密度估计的发展。它还为专注于乳腺组织分析在乳腺X光片中的医学图像分割方法的研究人员和基准测试提供了宝贵的工具。
数据描述
本数据集包含来自VinDr-Mammo公共数据集(可从[3]获取)的密集组织和乳腺区域分割掩膜,以及基于面积的乳腺密度百分比值。所有标注均由专家放射科医师完成并验证。
文件:
数据提供在两个压缩归档文件中,‘train.zip’和‘test.zip’。
• train.zip:此归档包含两个子文件夹:
- breast_masks:此子文件夹包含乳腺区域的真实分割掩膜,也以JPG格式存储。
- dense_masks:此子文件夹包含密集组织的真实分割掩膜,同样以JPG格式存储。
分割掩膜的尺寸为2800×3518像素。
文件列表:
与压缩归档文件一起提供两个CSV文件:
• train.csv:此文件包含有关训练集的信息,包含两列:
- Filename:此列包含训练集图像的文件名。这些图像可以直接从VinDr-Mammo数据集下载,https://physionet.org/content/vindr-mammo/1.0.0/。
- Density:此列提供了训练集中每张乳腺X光片的真实连续乳腺密度值,旨在用于乳腺密度估计任务。
• test.csv:此文件包含一个列,“Filename”,列出测试集的文件名。对于测试集不提供真实信息。真实信息有意保持私密,用于乳腺密度Kaggle挑战赛https://www.kaggle.com/competitions/breast-density-prediction,但最终将在数据集存储库中向公众开放。
提供机构:
Mendeley Data



