Histopathology images for end-to-end AI, based on TCGA-BRCA
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5337008
下载链接
链接失效反馈官方服务:
资源简介:
These are histopathological images which are derived from the TCGA-BRCA breast cancer histology dataset at https://portal.gdc.cancer.gov/ (please check this website for the original data license). They can be used for end-to-end artificial intelligence (AI) workflows such as DeepMed (https://github.com/KatherLab/deepmed) which aim to predict high-level features directly from digital images with weakly supervised transfer learning. Here, we use two subsets of these digitized images:
1) TCGA-BRCA-A2, these are all images from Walter Reed National Military Medical Center (tissue source site code A2, N=100 images) in the TCGA-BRCA database (tcga-brca-a2-deepmed-tiles.zip)
2) TCGA-BRCA-E2, these are all images from Roswell Park Comprehensive Cancer Center (tissue source site code E2, N=90 images) in the TCGA-BRCA database (tcga-brca-e2-deepmed-tiles.zip)
see also https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tissue-source-site-codes
The images were preprocessed according to the Aachen Protocol for Deep Learning Histopathology which is available at https://zenodo.org/record/3694994. Specifically, digital whole slide images (SVS format) of hematoxylin & eosin (H&E) stained slides were tessellated (without manual annotations) into tiles of 256x256 px edge length at 1 µm/px. Then, images were color-normalized using the Macenko method as described before (https://www.nature.com/articles/s43018-020-0087-6) and saved as JPEG files. For the A2 cohort, an additional ZIP archive is provided in which only 100 random image tiles are saved for each patient (tcga-brca-a2-deepmed-tiles_100.zip). In addition, we provide a CLINI and a SLIDE table as defined in the "Aachen Protocol". The CLINI table contains clinico-pathological data for all included patients and it is derived from clinical information on www.cbioportal.org as well as from Thorsson et al. (https://pubmed.ncbi.nlm.nih.gov/29628290/). We recommend to use the A2 dataset for training and the E2 dataset for testing. Please cite the relevant papers if you re-use this dataset, more information is available on www.kather.ai
创建时间:
2021-09-01



