Publicly Available Breast MRI Dataset for FGT Classification and Analysis
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14921216
下载链接
链接失效反馈官方服务:
资源简介:
Overview
This dataset was collected to support research in classifying the amount of fibroglandular tissue (FGT) in breast MRI scans. It includes both 2D and 3D MRI images from patients with varying levels of FGT, annotated according to the BI-RADS lexicon categories. The dataset aims to facilitate the development and evaluation of machine learning and deep learning models for FGT classification, which can contribute to breast cancer risk assessment.
Dataset Collection and Annotation
The initial dataset consisted of 826 breast MRI scans acquired at Imam Khomeini Hospital Complex, Tehran, between July 2017 and May 2022. Cases were excluded if they met any of the following criteria, as they may impact the classification of the normal contralateral breast:
Bilateral breast cancer
Presence of prostheses or prior mammoplasty
History of bilateral surgery
Poor MRI quality due to motion artifacts
After applying these exclusion criteria, 654 MRI scans remained. Two radiologists independently assessed the scans using 3D axial T1-weighted fat-saturated pre-contrast sequences, classifying them into BI-RADS lexicon categories:
A: Almost entirely fat
B: Scattered fibroglandular tissue
C: Heterogeneous fibroglandular tissue
D: Extreme fibroglandular tissue
In cases of disagreement between the two radiologists, a third radiologist is consulted to reach a consensus. The final dataset consists of 654 MRI scans, with one scan per patient. The patients range in age from 19 to 76 years, with an average age of 43.24 ± 9.64 years.
MRI Acquisition Details
All MRI scans were obtained using either:
1.5T scanner (Optima MR360, GE Healthcare)
3T scanner (Discovery MR750w, GE Healthcare)
Dataset Structure
3D Dataset
The 3D dataset consists of MRI scans converted from DICOM to NIFTI format for 3D assessment. These images maintain their original spatial resolution and provide volumetric information crucial for comprehensive analysis.
2D Dataset
For the 2D dataset, each patient’s scan was processed to include three middle slices corresponding to the 40th, 50th, and 60th percentile of the full 3D volume. These slices were converted from DICOM to PNG format and used for 2D assessments. These specific slices were chosen as they represent the central portion of the 3D scan, which typically contains the most relevant anatomical and tissue information. Additionally, the slices were selected to be compatible with pre-trained 2D deep learning models, which typically require three-channel input.
Data Splitting
The 2D and 3D datasets were randomly divided into three sets:
Training set: 498 patients
Validation set: 57 patients
Test set: 99 patients
The data was split while ensuring that the distribution of BI-RADS categories remained consistent across all sets. The same patients are present in the 2D and 3D versions of each dataset (train, validation, and test sets).
Labeling Information
The Labels.csv file provides essential metadata and labeling information. It contains the following columns:
Name: Corresponding to the file names of the 3D NIFTI files and the folders containing the three 2D slices
FGT: The FGT category label based on the BI-RADS lexicon (A: 1, B: 2, C: 3, D: 4)
Side: The contralateral breast side, indicating the normal breast side(s) (L: left, R: right, B: both)
Additionally, a Patient_Orientation_Guide.png image is included to clarify the left side of the patient’s anatomy for better understanding.
创建时间:
2025-04-12



