Publicly Available Breast MRI Dataset for FGT Classification and Analysis

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14921216

下载链接

链接失效反馈

官方服务：

资源简介：

Overview This dataset was collected to support research in classifying the amount of fibroglandular tissue (FGT) in breast MRI scans. It includes both 2D and 3D MRI images from patients with varying levels of FGT, annotated according to the BI-RADS lexicon categories. The dataset aims to facilitate the development and evaluation of machine learning and deep learning models for FGT classification, which can contribute to breast cancer risk assessment. Dataset Collection and Annotation The initial dataset consisted of 826 breast MRI scans acquired at Imam Khomeini Hospital Complex, Tehran, between July 2017 and May 2022. Cases were excluded if they met any of the following criteria, as they may impact the classification of the normal contralateral breast: Bilateral breast cancer Presence of prostheses or prior mammoplasty History of bilateral surgery Poor MRI quality due to motion artifacts After applying these exclusion criteria, 654 MRI scans remained. Two radiologists independently assessed the scans using 3D axial T1-weighted fat-saturated pre-contrast sequences, classifying them into BI-RADS lexicon categories: A: Almost entirely fat B: Scattered fibroglandular tissue C: Heterogeneous fibroglandular tissue D: Extreme fibroglandular tissue In cases of disagreement between the two radiologists, a third radiologist is consulted to reach a consensus. The final dataset consists of 654 MRI scans, with one scan per patient. The patients range in age from 19 to 76 years, with an average age of 43.24 ± 9.64 years. MRI Acquisition Details All MRI scans were obtained using either: 1.5T scanner (Optima MR360, GE Healthcare) 3T scanner (Discovery MR750w, GE Healthcare) Dataset Structure 3D Dataset The 3D dataset consists of MRI scans converted from DICOM to NIFTI format for 3D assessment. These images maintain their original spatial resolution and provide volumetric information crucial for comprehensive analysis. 2D Dataset For the 2D dataset, each patient’s scan was processed to include three middle slices corresponding to the 40th, 50th, and 60th percentile of the full 3D volume. These slices were converted from DICOM to PNG format and used for 2D assessments. These specific slices were chosen as they represent the central portion of the 3D scan, which typically contains the most relevant anatomical and tissue information. Additionally, the slices were selected to be compatible with pre-trained 2D deep learning models, which typically require three-channel input. Data Splitting The 2D and 3D datasets were randomly divided into three sets: Training set: 498 patients Validation set: 57 patients Test set: 99 patients The data was split while ensuring that the distribution of BI-RADS categories remained consistent across all sets. The same patients are present in the 2D and 3D versions of each dataset (train, validation, and test sets). Labeling Information The Labels.csv file provides essential metadata and labeling information. It contains the following columns: Name: Corresponding to the file names of the 3D NIFTI files and the folders containing the three 2D slices FGT: The FGT category label based on the BI-RADS lexicon (A: 1, B: 2, C: 3, D: 4) Side: The contralateral breast side, indicating the normal breast side(s) (L: left, R: right, B: both) Additionally, a Patient_Orientation_Guide.png image is included to clarify the left side of the patient’s anatomy for better understanding.

创建时间：

2025-04-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集