Digital Pathology Dataset for Prostate Cancer Diagnosis
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/5971763
下载链接
链接失效反馈官方服务:
资源简介:
Links to code and bioRxiv pre-print:
1. Multi-lens Neural Machine (MLNM) Code
2. An AI-assisted Tool For Efficient Prostate Cancer Diagnosis (bioRxiv Pre-print)
Digitized hematoxylin and eosin (H&E)-stained whole-slide-images (WSIs) of 40 prostatectomy and 59 core needle biopsy specimens were collected from 99 prostate cancer patients at Tan Tock Seng Hospital, Singapore. There were 99 WSIs in total such that each specimen had one WSI. H&E-stained slides were scanned at 40× magnification (specimen-level pixel size 0·25μm × 0·25μm) using Aperio AT2 Slide Scanner (Leica Biosystems). Institutional board review from the hospital were obtained for this study, and all the data were de-identified.
Prostate glandular structures in core needle biopsy slides were manually annotated and classified using the ASAP annotation tool (ASAP). A senior pathologist reviewed 10% of the annotations in each slide, ensuring that some reference annotations were provided to the researcher at different regions of the core. It is to be noted that partial glands appearing at the edges of the biopsy cores were not annotated.
Patches of size 512 × 512 pixels were cropped from whole slide images at resolutions 5×, 10×, 20×, and 40× with an annotated gland centered at each patch. This dataset contains these cropped images.
This dataset is used to train two AI models for Gland Segmentation (99 patients) and Gland Classification (46 patients). Tables 1 and 2 illustrate both gland segmentation and gland classification datasets. We have put the two corresponding sub-datasets as two zip files as follows:
gland_segmentation_dataset.zip
gland_classification_dataset.zip
Table 1: The number of slides and patches in training, validation, and test sets for gland segmentation task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen.
#Slides
Train
Valid
Test
Total
Prostatectomy
17
8
15
40
Biopsy
26
13
20
59
Total
43
21
35
99
#Patches
Train
Valid
Test
Total
Prostatectomy
7795
3753
7224
18772
Biopsy
5559
4028
5981
15568
Total
13354
7781
13205
34340
Table 2: The number of slides and patches in training, validation, and test sets for gland classification task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen. The gland classification datasets are the subsets of the gland segmentation datasets. GS: Gleason Score. B: Benign. M: Malignant.
#Slides (GS 3+3:3+4:4+3)
Train
Valid
Test
Total
Biopsy
10:9:1
3:7:0
6:10:0
19:26:1
#Patches (B:M)
Train
Valid
Test
Total
Biopsy
1557:2277
1216:1341
1543:2718
4316:6336
NB: Gland classification folder (gland_classification_dataset.zip) may contain extra patches, labels of which could not be identified from H&E slides. They were not used in the machine learning study.
创建时间:
2022-12-05



