five

lz222/PanNuke

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lz222/PanNuke
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: image dtype: image: mode: RGB - name: instances sequence: image: mode: '1' - name: categories sequence: class_label: names: '0': Neoplastic '1': Inflammatory '2': Connective '3': Dead '4': Epithelial - name: tissue dtype: class_label: names: '0': Adrenal Gland '1': Bile Duct '2': Bladder '3': Breast '4': Cervix '5': Colon '6': Esophagus '7': Head & Neck '8': Kidney '9': Liver '10': Lung '11': Ovarian '12': Pancreatic '13': Prostate '14': Skin '15': Stomach '16': Testis '17': Thyroid '18': Uterus splits: - name: fold1 num_bytes: 283673837.64 num_examples: 2656 - name: fold2 num_bytes: 267595457.439 num_examples: 2523 - name: fold3 num_bytes: 293079722.82 num_examples: 2722 download_size: 1665092597 dataset_size: 844349017.8989999 configs: - config_name: default data_files: - split: fold1 path: data/fold1-* - split: fold2 path: data/fold2-* - split: fold3 path: data/fold3-* license: cc-by-nc-sa-4.0 task_categories: - image-segmentation task_ids: - instance-segmentation language: - en tags: - medical - cell nuclei - H&E pretty_name: PanNuke size_categories: - 1K<n<10K paperswithcode_id: pannuke --- # PanNuke [![](https://github.com/Mr-TalhaIlyas/Prerpcessing-PanNuke-Nuclei-Instance-Segmentation-Dataset/blob/master/screens/img1.png?raw=true)](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke) ## Dataset Description - **Homepage:** [PanNuke Dataset for Nuclei Instance Segmentation and Classification](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke) - **Leaderboard:** [Panoptic Segmentation](https://paperswithcode.com/sota/panoptic-segmentation-on-pannuke) ## Description PanNuke is a semi-automatically generated dataset for nuclei instance segmentation and classification, providing comprehensive nuclei annotations across 19 tissue types and 5 distinct cell categories. The dataset includes a total of **189,744 labeled nuclei**, each accompanied by an instance segmentation mask, and contains **7,901 images**, each sized **256×256 pixels**. The images were captured at **x40 magnification** with a resolution of **0.25 µm/pixel**. The dataset is highly imbalanced, with the **"Dead" nuclei category** being particularly underrepresented. Please note that the dataset was created by extracting patches from whole-slide images (WSIs). As a result, some nuclei located at the edges of patches may be cropped, with fewer than 10 visible pixels in certain cases. ## Dataset Structure The dataset is organized into three folds: `fold1`, `fold2`, and `fold3`, consistent with the original dataset structure. Each fold contains data in a tabular format with the following four columns: - **`image`**: The RGB tile of the sample. - **`instances`**: A list of nuclei instances. Each instance represents exactly one nucleus and is in binary format (`1` - nucleus, `0` - background) - **`categories`**: An integer class label for each nucleus, corresponding to one of the following categories: 0. Neoplastic 1. Inflammatory 2. Connective 3. Dead 4. Epithelial - **`tissue`**: The integer tissue type from which the sample originates, belonging to one of these categories: 0. Adrenal Gland 1. Bile Duct 2. Bladder 3. Breast 4. Cervix 5. Colon 6. Esophagus 7. Head & Neck 8. Kidney 9. Liver 10. Lung 11. Ovarian 12. Pancreatic 13. Prostate 14. Skin 15. Stomach 16. Testis 17. Thyroid 18. Uterus ## Citation ```bibtex @inproceedings{gamper2019pannuke, title={PanNuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benes, Ksenija and Khuram, Ali and Rajpoot, Nasir}, booktitle={European Congress on Digital Pathology}, pages={11--19}, year={2019}, organization={Springer} } ``` ```bibtex @article{gamper2020pannuke, title={PanNuke Dataset Extension, Insights and Baselines}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Graham, Simon and Jahanifar, Mostafa and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir}, journal={arXiv preprint arXiv:2003.10778}, year={2020} } ```

数据集信息: 特征: - 名称:image(图像),数据类型:图像,模式为RGB - 名称:instances(实例掩码),序列类型:图像,模式为'1'(二进制单通道,1表示细胞核前景) - 名称:categories(细胞类别),序列类型:类别标签,其名称对应如下: 0: 肿瘤性(Neoplastic) 1: 炎症性(Inflammatory) 2: 结缔组织性(Connective) 3: 死亡细胞(Dead) 4: 上皮性(Epithelial) - 名称:tissue(组织类型),数据类型:类别标签,其名称对应如下: 0: 肾上腺(Adrenal Gland) 1: 胆管(Bile Duct) 2: 膀胱(Bladder) 3: 乳腺(Breast) 4: 宫颈(Cervix) 5: 结肠(Colon) 6: 食管(Esophagus) 7: 头颈部(Head & Neck) 8: 肾脏(Kidney) 9: 肝脏(Liver) 10: 肺(Lung) 11: 卵巢(Ovarian) 12: 胰腺(Pancreatic) 13: 前列腺(Prostate) 14: 皮肤(Skin) 15: 胃(Stomach) 16: 睾丸(Testis) 17: 甲状腺(Thyroid) 18: 子宫(Uterus) 划分集: - 名称:fold1,字节数:283673837.64,样本数:2656 - 名称:fold2,字节数:267595457.439,样本数:2523 - 名称:fold3,字节数:293079722.82,样本数:2722 下载大小:1665092597字节,数据集总大小:844349017.8989999字节 配置: - 配置名称:default,数据文件: - 划分集fold1:路径为data/fold1-* - 划分集fold2:路径为data/fold2-* - 划分集fold3:路径为data/fold3-* 许可协议:知识共享署名-非商业性使用-相同方式共享4.0(cc-by-nc-sa-4.0) 任务类别:图像分割(image-segmentation) 任务子类别:实例分割(instance-segmentation) 语言:英语 标签:医学(medical)、细胞核(cell nuclei)、苏木精-伊红染色(H&E) 数据集展示名:PanNuke 样本数量范围:1000<n<10000 PapersWithCode编号:pannuke # PanNuke [![](https://github.com/Mr-TalhaIlyas/Prerpcessing-PanNuke-Nuclei-Instance-Segmentation-Dataset/blob/master/screens/img1.png?raw=true)](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke) ## 数据集说明 - **主页**:[PanNuke细胞核实例分割与分类数据集](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke) - **排行榜**:[全景分割任务(PanNuke数据集)](https://paperswithcode.com/sota/panoptic-segmentation-on-pannuke) ## 数据集概述 PanNuke是一个半自动生成的细胞核实例分割与分类数据集,覆盖19种组织类型与5类不同细胞类别,提供全面的细胞核标注信息。该数据集总计包含**189,744个标注细胞核**,每个细胞核均配有实例分割掩码;同时包含**7,901张图像**,每张图像分辨率为**256×256像素**,采集于**40倍物镜放大倍率**,采样分辨率为**0.25 µm/像素**。本数据集存在严重的类别不平衡问题,其中“死亡细胞(Dead)”类别样本占比极低。 请注意:本数据集通过从全切片图像(whole-slide images, WSIs)中提取图像块生成,因此部分位于图像块边缘的细胞核可能被裁剪,部分样本中可见的细胞核像素数不足10个。 ## 数据集结构 本数据集按照原始结构划分为三个折损集:`fold1`、`fold2`与`fold3`。每个折损集采用表格格式存储,包含以下四列数据: - **`image`**:样本的RGB图像块 - **`instances`**:细胞核实例列表。每个实例对应一个独立细胞核,采用二进制格式存储(`1`表示细胞核区域,`0`表示背景) - **`categories`**:每个细胞核的整数类别标签,对应以下类别之一: 0. 肿瘤性(Neoplastic) 1. 炎症性(Inflammatory) 2. 结缔组织性(Connective) 3. 死亡细胞(Dead) 4. 上皮性(Epithelial) - **`tissue`**:样本来源的组织类型整数标签,对应以下类别之一: 0. 肾上腺(Adrenal Gland) 1. 胆管(Bile Duct) 2. 膀胱(Bladder) 3. 乳腺(Breast) 4. 宫颈(Cervix) 5. 结肠(Colon) 6. 食管(Esophagus) 7. 头颈部(Head & Neck) 8. 肾脏(Kidney) 9. 肝脏(Liver) 10. 肺(Lung) 11. 卵巢(Ovarian) 12. 胰腺(Pancreatic) 13. 前列腺(Prostate) 14. 皮肤(Skin) 15. 胃(Stomach) 16. 睾丸(Testis) 17. 甲状腺(Thyroid) 18. 子宫(Uterus) ## 引用 bibtex @inproceedings{gamper2019pannuke, title={PanNuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benes, Ksenija and Khuram, Ali and Rajpoot, Nasir}, booktitle={European Congress on Digital Pathology}, pages={11--19}, year={2019}, organization={Springer} } bibtex @article{gamper2020pannuke, title={PanNuke Dataset Extension, Insights and Baselines}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Graham, Simon and Jahanifar, Mostafa and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir}, journal={arXiv preprint arXiv:2003.10778}, year={2020} }
提供机构:
lz222
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作