five

Digital Pathology Dataset for Prostate Cancer Diagnosis

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/5971763
下载链接
链接失效反馈
官方服务:
资源简介:
Links to code and  bioRxiv pre-print: 1. Multi-lens Neural Machine (MLNM) Code 2. An AI-assisted Tool For Efficient Prostate Cancer Diagnosis (bioRxiv Pre-print) Digitized hematoxylin and eosin (H&E)-stained whole-slide-images (WSIs) of 40 prostatectomy and 59 core needle biopsy specimens were collected from 99 prostate cancer patients at Tan Tock Seng Hospital, Singapore. There were 99 WSIs in total such that each specimen had one WSI. H&E-stained slides were scanned at 40× magnification (specimen-level pixel size 0·25μm × 0·25μm) using Aperio AT2 Slide Scanner (Leica Biosystems). Institutional board review from the hospital were obtained for this study, and all the data were de-identified. Prostate glandular structures in core needle biopsy slides were manually annotated and classified using the ASAP annotation tool (ASAP). A senior pathologist reviewed 10% of the annotations in each slide, ensuring that some reference annotations were provided to the researcher at different regions of the core. It is to be noted that partial glands appearing at the edges of the biopsy cores were not annotated. Patches of size 512 × 512 pixels were cropped from whole slide images at resolutions 5×, 10×, 20×, and 40× with an annotated gland centered at each patch. This dataset contains these cropped images. This dataset is used to train two AI models for Gland Segmentation (99 patients) and Gland Classification (46 patients). Tables 1 and 2 illustrate both gland segmentation and gland classification datasets. We have put the two corresponding sub-datasets as two zip files as follows: gland_segmentation_dataset.zip gland_classification_dataset.zip Table 1: The number of slides and patches in training, validation, and test sets for gland segmentation task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen.   #Slides         Train Valid Test Total Prostatectomy 17 8 15 40 Biopsy 26 13 20 59 Total 43 21 35 99   #Patches         Train Valid Test Total Prostatectomy 7795 3753 7224 18772 Biopsy 5559 4028 5981 15568 Total 13354 7781 13205 34340 Table 2: The number of slides and patches in training, validation, and test sets for gland classification task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen. The gland classification datasets are the subsets of the gland segmentation datasets. GS: Gleason Score. B: Benign. M: Malignant.   #Slides (GS  3+3:3+4:4+3)         Train Valid Test Total Biopsy 10:9:1 3:7:0 6:10:0 19:26:1   #Patches (B:M)         Train Valid Test Total Biopsy 1557:2277 1216:1341 1543:2718 4316:6336 NB: Gland classification folder (gland_classification_dataset.zip) may contain extra patches, labels of which could not be identified from H&E slides. They were not used in the machine learning study.
创建时间:
2022-12-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作