five

Pan-Cancer-Nuclei-Seg-DICOM: DICOM converted Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11099004
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: Pan-Cancer-Nuclei-Seg-DICOM. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below. Collection description This collection contains automatic nucleus segmentation data of 5,060 whole slide tissue images of 10 cancer types earlier published in [2] (https://doi.org/10.7937/TCIA.2019.4A4DKP9U) stored in DICOM Bulk Annotation and DICOM Segmentation formats.   DICOM Bulk Annotation nuclei annotations are stored as closed polygons along with the area of each nuclei. DICOM Segmentation version contains binary segmentations obtained by rasterizing the polygon contours.    The annotations correspond to digital pathology images from the TCGA-BLCA,TCGA-BRCA,TCGA-CESC,TCGA-COAD,TCGA-GBM,TCGA-LUAD,TCGA-LUSC,TCGA-PAAD,TCGA-PRAD,TCGA-READ,TCGA-SKCM,TCGA-STAD,TCGA-UCEC,TCGA-UVM collections available in NCI Imaging Data Commons.   To learn how these files are organized and how to access the content programmatically, see this documentation page: https://highdicom.readthedocs.io/en/latest/ann.html.   Conversion of the nuclei segmentations from the original format into DICOM ANN and SEG representations was done using the code available in 10.5281/zenodo.10632181.   Annotations corresponding to this container ID in the source failed to convert due to the pixel matrix being too large to store:  TCGA-OL-A66K-01Z-00-DX1   The following container IDs from the source annotations have failed due to inability to find the annotated images using the container IDs: TCGA-CU-A3QU-01Z-00-DX1 TCGA-A2-A0D1-01Z-00-DX1 TCGA-AQ-A1H2-01Z-00-DX1 TCGA-AQ-A1H2-01Z-00-DX1 TCGA-AQ-A1H3-01Z-00-DX1 TCGA-AQ-A1H3-01Z-00-DX1 TCGA-BH-A0B2-01Z-00-DX1 TCGA-E2-A15E-01Z-00-DX1 TCGA-E2-A1IP-01Z-00-DX1 TCGA-F4-6857-01Z-00-DX1 TCGA-12-0773-01Z-00-DX4 TCGA-35-3621-01Z-00-DX1 TCGA-49-4486-01Z-00-DX1 TCGA-33-4587-01Z-00-DX1 TCGA-D9-A1X3-01Z-00-DX1 TCGA-D9-A1X3-01Z-00-DX2 TCGA-D9-A4Z6-01Z-00-DX1 TCGA-EE-A17Y-01Z-00-DX1 TCGA-EE-A29R-01Z-00-DX1 TCGA-EE-A2A0-01Z-00-DX1 TCGA-EE-A2MS-01Z-00-DX1 TCGA-ER-A199-01Z-00-DX1 TCGA-ER-A1A1-01Z-00-DX1 TCGA-ER-A2NC-01Z-00-DX1 TCGA-FS-A1Z7-06Z-00-DX10 TCGA-FS-A1Z7-06Z-00-DX11 TCGA-FS-A1Z7-06Z-00-DX12 TCGA-FS-A1Z7-06Z-00-DX13 TCGA-FS-A1ZN-01Z-00-DX10 TCGA-FS-A1ZN-01Z-00-DX11 TCGA-FS-A1ZW-06Z-00-DX10 TCGA-FS-A1ZW-06Z-00-DX11 TCGA-GN-A261-01Z-00-DX1 TCGA-GN-A266-01Z-00-DX1 TCGA-GN-A268-01Z-00-DX1 TCGA-GN-A26A-01Z-00-DX1 TCGA-XV-AB01-01Z-00-DX1 TCGA-AJ-A23O-01Z-00-DX1 TCGA-AP-A056-01Z-00-DX1 TCGA-BK-A139-01Z-00-DX1 TCGA-E6-A1M0-01Z-00-DX1 Files included A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, pan_cancer_nuclei_seg_dicom-collection_id-idc_v19-aws.s5cmd corresponds to the annotations for th eimages in the collection_id collection introduced in IDC data release v19. DICOM Binary segmentations were introduced in IDC v20. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced. For each of the collections, the following manifest files are provided: pan_cancer_nuclei_seg_dicom--idc_v20-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets pan_cancer_nuclei_seg_dicom--idc_v20-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets pan_cancer_nuclei_seg_dicom--idc_v20-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids) Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP. Download instructions Each of the manifests include instructions in the header on how to download the included files. To download the files using .s5cmd manifests: install idc-index package: pip install --upgrade idc-index download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd To download the files using .dcf manifest, see manifest header. Acknowledgments Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l. References [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).   [2] Hou, L., Gupta, R., Van Arnam, J. S., Zhang, Y., Sivalenka, K., Samaras, D., Kurc, T., & Saltz, J. H. (2019). Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images of 10 Cancer Types [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.2019.4A4DKP9U
创建时间:
2025-02-07
二维码
社区交流群
二维码
科研交流群
商业服务