Segmentation masks for: Comprehensive evaluation of cross-cancer generalization in histopathology segmentation models across 21 tumor types
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18669666
下载链接
链接失效反馈官方服务:
资源简介:
This dataset accompanies the paper "Comprehensive evaluation of cross-cancer generalization in histopathology segmentation models across 21 tumor types."
It contains 38,080 semantic segmentation masks for 7,616 tumor tissue regions of interest (ROIs) across 21 TCGA cancer types. Each ROI was segmented by five organ-specific deep learning models (breast, colon, lung, kidney, prostate), yielding five masks per ROI. The masks are single-channel (grayscale) PNG images where pixel values represent 0-based class indices as detailed in the table below.
Class index mapping
Index
Breast (model 1)
Colon (model 2)
Lung (model 3)
Kidney (model 4)
Prostate (model 5)
0
Background
Background
Background
Background
Background
1
Tumor
Tumor / Adenoma HG
Tumor
Tumor
Tumor
2
Tumor Stroma
Mucosa / Adenoma LG
Tumor Stroma
Tumor Regression
Normal Epithelium
3
DCIS
Tumor Stroma
Necrosis
Necrosis
Normal Stroma
4
LCIS
Submucosa
Mucin
Kidney Benign
Background (slide)
5
Necrosis
Muscularis
Benign Lung
Urothelium
6
Mucin
Adventitia / Vessel
Stroma / Nerve / Fat
Fat
7
Inflammation
Lymph Node / Tissue
Blood
Stroma
8
Fat
Ulcus / Necrosis
Bronchus
Blood
9
Stroma
Blood
Cartilage
Adrenal
10
Blood
Mucin
Gland (Bronchial)
Background (slide)
11
Skin
Background (slide)
Lymph Aggregates / Lymph Node
12
Benign Epithelium
Background (slide)
13
Background (slide)
Contents
The dataset is organized as 21 TAR archives, one per TCGA project:
TCGA-BLCA (445 ROIs × 5 = 2,225 masks), TCGA-BRCA (1,007 × 5 = 5,035), TCGA-CESC (276 × 5 = 1,380), TCGA-CHOL (38 × 5 = 190), TCGA-COADREAD (590 × 5 = 2,950), TCGA-ESCA (157 × 5 = 785), TCGA-HNSC (456 × 5 = 2,280), TCGA-KICH (109 × 5 = 545), TCGA-KIRC (512 × 5 = 2,560), TCGA-KIRP (282 × 5 = 1,410), TCGA-LIHC (372 × 5 = 1,860), TCGA-LUAD (379 × 5 = 1,895), TCGA-LUSC (301 × 5 = 1,505), TCGA-MESO (84 × 5 = 420), TCGA-OV (106 × 5 = 530), TCGA-PAAD (199 × 5 = 995), TCGA-PRAD (415 × 5 = 2,075), TCGA-SKCM (457 × 5 = 2,285), TCGA-STAD (374 × 5 = 1,870), TCGA-THCA (504 × 5 = 2,520), TCGA-UCEC (553 × 5 = 2,765)
Each archive extracts to a directory named after its TCGA project. Individual files follow the naming convention:
{TCGA-ID}_{mpp}_tumor_model{N}_mask.png
TCGA-ID — TCGA case and slide identifier
mpp — spatial resolution of the source ROI in microns per pixel
N — segmentation model (1 = breast, 2 = colon, 3 = lung, 4 = kidney, 5 = prostate)
Related datasets
Tumor ROI images (input to segmentation): 10.5281/zenodo.18668580
Evaluation data (scoring results, Dice coefficients, clinical metadata): 10.5281/zenodo.18518811
Code (annotation, inference, scoring, and analysis pipeline): 10.5281/zenodo.18520078
提供机构:
Zenodo
创建时间:
2026-02-20



