lz222/PanNuke
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lz222/PanNuke
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype:
image:
mode: RGB
- name: instances
sequence:
image:
mode: '1'
- name: categories
sequence:
class_label:
names:
'0': Neoplastic
'1': Inflammatory
'2': Connective
'3': Dead
'4': Epithelial
- name: tissue
dtype:
class_label:
names:
'0': Adrenal Gland
'1': Bile Duct
'2': Bladder
'3': Breast
'4': Cervix
'5': Colon
'6': Esophagus
'7': Head & Neck
'8': Kidney
'9': Liver
'10': Lung
'11': Ovarian
'12': Pancreatic
'13': Prostate
'14': Skin
'15': Stomach
'16': Testis
'17': Thyroid
'18': Uterus
splits:
- name: fold1
num_bytes: 283673837.64
num_examples: 2656
- name: fold2
num_bytes: 267595457.439
num_examples: 2523
- name: fold3
num_bytes: 293079722.82
num_examples: 2722
download_size: 1665092597
dataset_size: 844349017.8989999
configs:
- config_name: default
data_files:
- split: fold1
path: data/fold1-*
- split: fold2
path: data/fold2-*
- split: fold3
path: data/fold3-*
license: cc-by-nc-sa-4.0
task_categories:
- image-segmentation
task_ids:
- instance-segmentation
language:
- en
tags:
- medical
- cell nuclei
- H&E
pretty_name: PanNuke
size_categories:
- 1K<n<10K
paperswithcode_id: pannuke
---
# PanNuke
[](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke)
## Dataset Description
- **Homepage:** [PanNuke Dataset for Nuclei Instance Segmentation and Classification](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke)
- **Leaderboard:** [Panoptic Segmentation](https://paperswithcode.com/sota/panoptic-segmentation-on-pannuke)
## Description
PanNuke is a semi-automatically generated dataset for nuclei instance segmentation and classification, providing comprehensive nuclei annotations across 19 tissue types and 5 distinct cell categories. The dataset includes a total of **189,744 labeled nuclei**, each accompanied by an instance segmentation mask, and contains **7,901 images**, each sized **256×256 pixels**. The images were captured at **x40 magnification** with a resolution of **0.25 µm/pixel**. The dataset is highly imbalanced, with the **"Dead" nuclei category** being particularly underrepresented.
Please note that the dataset was created by extracting patches from whole-slide images (WSIs). As a result, some nuclei located at the edges of patches may be cropped, with fewer than 10 visible pixels in certain cases.
## Dataset Structure
The dataset is organized into three folds: `fold1`, `fold2`, and `fold3`, consistent with the original dataset structure. Each fold contains data in a tabular format with the following four columns:
- **`image`**: The RGB tile of the sample.
- **`instances`**: A list of nuclei instances. Each instance represents exactly one nucleus and is in binary format (`1` - nucleus, `0` - background)
- **`categories`**: An integer class label for each nucleus, corresponding to one of the following categories:
0. Neoplastic
1. Inflammatory
2. Connective
3. Dead
4. Epithelial
- **`tissue`**: The integer tissue type from which the sample originates, belonging to one of these categories:
0. Adrenal Gland
1. Bile Duct
2. Bladder
3. Breast
4. Cervix
5. Colon
6. Esophagus
7. Head & Neck
8. Kidney
9. Liver
10. Lung
11. Ovarian
12. Pancreatic
13. Prostate
14. Skin
15. Stomach
16. Testis
17. Thyroid
18. Uterus
## Citation
```bibtex
@inproceedings{gamper2019pannuke,
title={PanNuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification},
author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benes, Ksenija and Khuram, Ali and Rajpoot, Nasir},
booktitle={European Congress on Digital Pathology},
pages={11--19},
year={2019},
organization={Springer}
}
```
```bibtex
@article{gamper2020pannuke,
title={PanNuke Dataset Extension, Insights and Baselines},
author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Graham, Simon and Jahanifar, Mostafa and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir},
journal={arXiv preprint arXiv:2003.10778},
year={2020}
}
```
数据集信息:
特征:
- 名称:image(图像),数据类型:图像,模式为RGB
- 名称:instances(实例掩码),序列类型:图像,模式为'1'(二进制单通道,1表示细胞核前景)
- 名称:categories(细胞类别),序列类型:类别标签,其名称对应如下:
0: 肿瘤性(Neoplastic)
1: 炎症性(Inflammatory)
2: 结缔组织性(Connective)
3: 死亡细胞(Dead)
4: 上皮性(Epithelial)
- 名称:tissue(组织类型),数据类型:类别标签,其名称对应如下:
0: 肾上腺(Adrenal Gland)
1: 胆管(Bile Duct)
2: 膀胱(Bladder)
3: 乳腺(Breast)
4: 宫颈(Cervix)
5: 结肠(Colon)
6: 食管(Esophagus)
7: 头颈部(Head & Neck)
8: 肾脏(Kidney)
9: 肝脏(Liver)
10: 肺(Lung)
11: 卵巢(Ovarian)
12: 胰腺(Pancreatic)
13: 前列腺(Prostate)
14: 皮肤(Skin)
15: 胃(Stomach)
16: 睾丸(Testis)
17: 甲状腺(Thyroid)
18: 子宫(Uterus)
划分集:
- 名称:fold1,字节数:283673837.64,样本数:2656
- 名称:fold2,字节数:267595457.439,样本数:2523
- 名称:fold3,字节数:293079722.82,样本数:2722
下载大小:1665092597字节,数据集总大小:844349017.8989999字节
配置:
- 配置名称:default,数据文件:
- 划分集fold1:路径为data/fold1-*
- 划分集fold2:路径为data/fold2-*
- 划分集fold3:路径为data/fold3-*
许可协议:知识共享署名-非商业性使用-相同方式共享4.0(cc-by-nc-sa-4.0)
任务类别:图像分割(image-segmentation)
任务子类别:实例分割(instance-segmentation)
语言:英语
标签:医学(medical)、细胞核(cell nuclei)、苏木精-伊红染色(H&E)
数据集展示名:PanNuke
样本数量范围:1000<n<10000
PapersWithCode编号:pannuke
# PanNuke
[](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke)
## 数据集说明
- **主页**:[PanNuke细胞核实例分割与分类数据集](https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke)
- **排行榜**:[全景分割任务(PanNuke数据集)](https://paperswithcode.com/sota/panoptic-segmentation-on-pannuke)
## 数据集概述
PanNuke是一个半自动生成的细胞核实例分割与分类数据集,覆盖19种组织类型与5类不同细胞类别,提供全面的细胞核标注信息。该数据集总计包含**189,744个标注细胞核**,每个细胞核均配有实例分割掩码;同时包含**7,901张图像**,每张图像分辨率为**256×256像素**,采集于**40倍物镜放大倍率**,采样分辨率为**0.25 µm/像素**。本数据集存在严重的类别不平衡问题,其中“死亡细胞(Dead)”类别样本占比极低。
请注意:本数据集通过从全切片图像(whole-slide images, WSIs)中提取图像块生成,因此部分位于图像块边缘的细胞核可能被裁剪,部分样本中可见的细胞核像素数不足10个。
## 数据集结构
本数据集按照原始结构划分为三个折损集:`fold1`、`fold2`与`fold3`。每个折损集采用表格格式存储,包含以下四列数据:
- **`image`**:样本的RGB图像块
- **`instances`**:细胞核实例列表。每个实例对应一个独立细胞核,采用二进制格式存储(`1`表示细胞核区域,`0`表示背景)
- **`categories`**:每个细胞核的整数类别标签,对应以下类别之一:
0. 肿瘤性(Neoplastic)
1. 炎症性(Inflammatory)
2. 结缔组织性(Connective)
3. 死亡细胞(Dead)
4. 上皮性(Epithelial)
- **`tissue`**:样本来源的组织类型整数标签,对应以下类别之一:
0. 肾上腺(Adrenal Gland)
1. 胆管(Bile Duct)
2. 膀胱(Bladder)
3. 乳腺(Breast)
4. 宫颈(Cervix)
5. 结肠(Colon)
6. 食管(Esophagus)
7. 头颈部(Head & Neck)
8. 肾脏(Kidney)
9. 肝脏(Liver)
10. 肺(Lung)
11. 卵巢(Ovarian)
12. 胰腺(Pancreatic)
13. 前列腺(Prostate)
14. 皮肤(Skin)
15. 胃(Stomach)
16. 睾丸(Testis)
17. 甲状腺(Thyroid)
18. 子宫(Uterus)
## 引用
bibtex
@inproceedings{gamper2019pannuke,
title={PanNuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification},
author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benes, Ksenija and Khuram, Ali and Rajpoot, Nasir},
booktitle={European Congress on Digital Pathology},
pages={11--19},
year={2019},
organization={Springer}
}
bibtex
@article{gamper2020pannuke,
title={PanNuke Dataset Extension, Insights and Baselines},
author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Graham, Simon and Jahanifar, Mostafa and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir},
journal={arXiv preprint arXiv:2003.10778},
year={2020}
}
提供机构:
lz222



