five

Segmentation masks for: Comprehensive evaluation of cross-cancer generalization in histopathology segmentation models across 21 tumor types

收藏
DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18669666
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset accompanies the paper "Comprehensive evaluation of cross-cancer generalization in histopathology segmentation models across 21 tumor types." It contains 38,080 semantic segmentation masks for 7,616 tumor tissue regions of interest (ROIs) across 21 TCGA cancer types. Each ROI was segmented by five organ-specific deep learning models (breast, colon, lung, kidney, prostate), yielding five masks per ROI. The masks are single-channel (grayscale) PNG images where pixel values represent 0-based class indices as detailed in the table below. Class index mapping Index Breast (model 1) Colon (model 2) Lung (model 3) Kidney (model 4) Prostate (model 5) 0 Background Background Background Background Background 1 Tumor Tumor / Adenoma HG Tumor Tumor Tumor 2 Tumor Stroma Mucosa / Adenoma LG Tumor Stroma Tumor Regression Normal Epithelium 3 DCIS Tumor Stroma Necrosis Necrosis Normal Stroma 4 LCIS Submucosa Mucin Kidney Benign Background (slide) 5 Necrosis Muscularis Benign Lung Urothelium   6 Mucin Adventitia / Vessel Stroma / Nerve / Fat Fat   7 Inflammation Lymph Node / Tissue Blood Stroma   8 Fat Ulcus / Necrosis Bronchus Blood   9 Stroma Blood Cartilage Adrenal   10 Blood Mucin Gland (Bronchial) Background (slide)   11 Skin Background (slide) Lymph Aggregates / Lymph Node     12 Benign Epithelium   Background (slide)     13 Background (slide)         Contents The dataset is organized as 21 TAR archives, one per TCGA project: TCGA-BLCA (445 ROIs × 5 = 2,225 masks), TCGA-BRCA (1,007 × 5 = 5,035), TCGA-CESC (276 × 5 = 1,380), TCGA-CHOL (38 × 5 = 190), TCGA-COADREAD (590 × 5 = 2,950), TCGA-ESCA (157 × 5 = 785), TCGA-HNSC (456 × 5 = 2,280), TCGA-KICH (109 × 5 = 545), TCGA-KIRC (512 × 5 = 2,560), TCGA-KIRP (282 × 5 = 1,410), TCGA-LIHC (372 × 5 = 1,860), TCGA-LUAD (379 × 5 = 1,895), TCGA-LUSC (301 × 5 = 1,505), TCGA-MESO (84 × 5 = 420), TCGA-OV (106 × 5 = 530), TCGA-PAAD (199 × 5 = 995), TCGA-PRAD (415 × 5 = 2,075), TCGA-SKCM (457 × 5 = 2,285), TCGA-STAD (374 × 5 = 1,870), TCGA-THCA (504 × 5 = 2,520), TCGA-UCEC (553 × 5 = 2,765) Each archive extracts to a directory named after its TCGA project. Individual files follow the naming convention: {TCGA-ID}_{mpp}_tumor_model{N}_mask.png TCGA-ID — TCGA case and slide identifier mpp — spatial resolution of the source ROI in microns per pixel N — segmentation model (1 = breast, 2 = colon, 3 = lung, 4 = kidney, 5 = prostate) Related datasets Tumor ROI images (input to segmentation): 10.5281/zenodo.18668580 Evaluation data (scoring results, Dice coefficients, clinical metadata): 10.5281/zenodo.18518811 Code (annotation, inference, scoring, and analysis pipeline): 10.5281/zenodo.18520078
提供机构:
Zenodo
创建时间:
2026-02-20
二维码
社区交流群
二维码
科研交流群
商业服务