lz222/PanNuke

Name: lz222/PanNuke
Creator: lz222
Published: 2026-03-23 15:31:24
License: 暂无描述

Hugging Face2026-03-23 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/lz222/PanNuke

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: image dtype: image: mode: RGB - name: instances sequence: image: mode: '1' - name: categories sequence: class_label: names: '0': Neoplastic '1': Inflammatory '2': Connective '3': Dead '4': Epithelial - name: tissue dtype: class_label: names: '0': Adrenal Gland '1': Bile Duct '2': Bladder '3': Breast '4': Cervix '5': Colon '6': Esophagus '7': Head & Neck '8': Kidney '9': Liver '10': Lung '11': Ovarian '12': Pancreatic '13': Prostate '14': Skin '15': Stomach '16': Testis '17': Thyroid '18': Uterus splits: - name: fold1 num_bytes: 283673837.64 num_examples: 2656 - name: fold2 num_bytes: 267595457.439 num_examples: 2523 - name: fold3 num_bytes: 293079722.82 num_examples: 2722 download_size: 1665092597 dataset_size: 844349017.8989999 configs: - config_name: default data_files: - split: fold1 path: data/fold1-* - split: fold2 path: data/fold2-* - split: fold3 path: data/fold3-* license: cc-by-nc-sa-4.0 task_categories: - image-segmentation task_ids: - instance-segmentation language: - en tags: - medical - cell nuclei - H&E pretty_name: PanNuke size_categories: - 1K<n<10K paperswithcode_id: pannuke --- # PanNuke [![](https://github.com/Mr-TalhaIlyas/Prerpcessing-PanNuke-Nuclei-Instance-Segmentation-Dataset/blob/master/screens/img1.png?raw=true)](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke) ## Dataset Description - **Homepage:** [PanNuke Dataset for Nuclei Instance Segmentation and Classification](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke) - **Leaderboard:** [Panoptic Segmentation](https://paperswithcode.com/sota/panoptic-segmentation-on-pannuke) ## Description PanNuke is a semi-automatically generated dataset for nuclei instance segmentation and classification, providing comprehensive nuclei annotations across 19 tissue types and 5 distinct cell categories. The dataset includes a total of **189,744 labeled nuclei**, each accompanied by an instance segmentation mask, and contains **7,901 images**, each sized **256×256 pixels**. The images were captured at **x40 magnification** with a resolution of **0.25 µm/pixel**. The dataset is highly imbalanced, with the **"Dead" nuclei category** being particularly underrepresented. Please note that the dataset was created by extracting patches from whole-slide images (WSIs). As a result, some nuclei located at the edges of patches may be cropped, with fewer than 10 visible pixels in certain cases. ## Dataset Structure The dataset is organized into three folds: `fold1`, `fold2`, and `fold3`, consistent with the original dataset structure. Each fold contains data in a tabular format with the following four columns: - **`image`**: The RGB tile of the sample. - **`instances`**: A list of nuclei instances. Each instance represents exactly one nucleus and is in binary format (`1` - nucleus, `0` - background) - **`categories`**: An integer class label for each nucleus, corresponding to one of the following categories: 0. Neoplastic 1. Inflammatory 2. Connective 3. Dead 4. Epithelial - **`tissue`**: The integer tissue type from which the sample originates, belonging to one of these categories: 0. Adrenal Gland 1. Bile Duct 2. Bladder 3. Breast 4. Cervix 5. Colon 6. Esophagus 7. Head & Neck 8. Kidney 9. Liver 10. Lung 11. Ovarian 12. Pancreatic 13. Prostate 14. Skin 15. Stomach 16. Testis 17. Thyroid 18. Uterus ## Citation ```bibtex @inproceedings{gamper2019pannuke, title={PanNuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benes, Ksenija and Khuram, Ali and Rajpoot, Nasir}, booktitle={European Congress on Digital Pathology}, pages={11--19}, year={2019}, organization={Springer} } ``` ```bibtex @article{gamper2020pannuke, title={PanNuke Dataset Extension, Insights and Baselines}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Graham, Simon and Jahanifar, Mostafa and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir}, journal={arXiv preprint arXiv:2003.10778}, year={2020} } ```

数据集信息：特征： - 名称：image（图像），数据类型：图像，模式为RGB - 名称：instances（实例掩码），序列类型：图像，模式为'1'（二进制单通道，1表示细胞核前景） - 名称：categories（细胞类别），序列类型：类别标签，其名称对应如下： 0: 肿瘤性（Neoplastic） 1: 炎症性（Inflammatory） 2: 结缔组织性（Connective） 3: 死亡细胞（Dead） 4: 上皮性（Epithelial） - 名称：tissue（组织类型），数据类型：类别标签，其名称对应如下： 0: 肾上腺（Adrenal Gland） 1: 胆管（Bile Duct） 2: 膀胱（Bladder） 3: 乳腺（Breast） 4: 宫颈（Cervix） 5: 结肠（Colon） 6: 食管（Esophagus） 7: 头颈部（Head & Neck） 8: 肾脏（Kidney） 9: 肝脏（Liver） 10: 肺（Lung） 11: 卵巢（Ovarian） 12: 胰腺（Pancreatic） 13: 前列腺（Prostate） 14: 皮肤（Skin） 15: 胃（Stomach） 16: 睾丸（Testis） 17: 甲状腺（Thyroid） 18: 子宫（Uterus）划分集： - 名称：fold1，字节数：283673837.64，样本数：2656 - 名称：fold2，字节数：267595457.439，样本数：2523 - 名称：fold3，字节数：293079722.82，样本数：2722 下载大小：1665092597字节，数据集总大小：844349017.8989999字节配置： - 配置名称：default，数据文件： - 划分集fold1：路径为data/fold1-* - 划分集fold2：路径为data/fold2-* - 划分集fold3：路径为data/fold3-* 许可协议：知识共享署名-非商业性使用-相同方式共享4.0（cc-by-nc-sa-4.0）任务类别：图像分割（image-segmentation）任务子类别：实例分割（instance-segmentation）语言：英语标签：医学（medical）、细胞核（cell nuclei）、苏木精-伊红染色（H&E）数据集展示名：PanNuke 样本数量范围：1000<n<10000 PapersWithCode编号：pannuke # PanNuke [![](https://github.com/Mr-TalhaIlyas/Prerpcessing-PanNuke-Nuclei-Instance-Segmentation-Dataset/blob/master/screens/img1.png?raw=true)](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke) ## 数据集说明 - **主页**：[PanNuke细胞核实例分割与分类数据集](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke) - **排行榜**：[全景分割任务（PanNuke数据集）](https://paperswithcode.com/sota/panoptic-segmentation-on-pannuke) ## 数据集概述 PanNuke是一个半自动生成的细胞核实例分割与分类数据集，覆盖19种组织类型与5类不同细胞类别，提供全面的细胞核标注信息。该数据集总计包含**189,744个标注细胞核**，每个细胞核均配有实例分割掩码；同时包含**7,901张图像**，每张图像分辨率为**256×256像素**，采集于**40倍物镜放大倍率**，采样分辨率为**0.25 µm/像素**。本数据集存在严重的类别不平衡问题，其中“死亡细胞（Dead）”类别样本占比极低。请注意：本数据集通过从全切片图像（whole-slide images, WSIs）中提取图像块生成，因此部分位于图像块边缘的细胞核可能被裁剪，部分样本中可见的细胞核像素数不足10个。 ## 数据集结构本数据集按照原始结构划分为三个折损集：`fold1`、`fold2`与`fold3`。每个折损集采用表格格式存储，包含以下四列数据： - **`image`**：样本的RGB图像块 - **`instances`**：细胞核实例列表。每个实例对应一个独立细胞核，采用二进制格式存储（`1`表示细胞核区域，`0`表示背景） - **`categories`**：每个细胞核的整数类别标签，对应以下类别之一： 0. 肿瘤性（Neoplastic） 1. 炎症性（Inflammatory） 2. 结缔组织性（Connective） 3. 死亡细胞（Dead） 4. 上皮性（Epithelial） - **`tissue`**：样本来源的组织类型整数标签，对应以下类别之一： 0. 肾上腺（Adrenal Gland） 1. 胆管（Bile Duct） 2. 膀胱（Bladder） 3. 乳腺（Breast） 4. 宫颈（Cervix） 5. 结肠（Colon） 6. 食管（Esophagus） 7. 头颈部（Head & Neck） 8. 肾脏（Kidney） 9. 肝脏（Liver） 10. 肺（Lung） 11. 卵巢（Ovarian） 12. 胰腺（Pancreatic） 13. 前列腺（Prostate） 14. 皮肤（Skin） 15. 胃（Stomach） 16. 睾丸（Testis） 17. 甲状腺（Thyroid） 18. 子宫（Uterus） ## 引用 bibtex @inproceedings{gamper2019pannuke, title={PanNuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benes, Ksenija and Khuram, Ali and Rajpoot, Nasir}, booktitle={European Congress on Digital Pathology}, pages={11--19}, year={2019}, organization={Springer} } bibtex @article{gamper2020pannuke, title={PanNuke Dataset Extension, Insights and Baselines}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Graham, Simon and Jahanifar, Mostafa and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir}, journal={arXiv preprint arXiv:2003.10778}, year={2020} }

提供机构：

lz222

5,000+

优质数据集

54 个

任务类型

进入经典数据集