five

GZMH dataset for mitotic nuclei identification of breast cancer pathological images

收藏
科学数据银行2025-03-26 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=da31601544ce42ec8badf85b689f64d2
下载链接
链接失效反馈
官方服务:
资源简介:
一、数据集GZMH第一版说明有丝分裂细胞核计数是乳腺癌诊断和组织学分级的三个重要评分指标之一。目前它依赖于病理学家通过显微镜观察病理图像热点区域,从WSI对有丝分裂细胞核进行阅片和计数,非常耗时和费力。基于深度学习的自动检测方法可以有效帮助医生识别和计数乳腺病理图像中的有丝分裂细胞核。然而,目前研究中的公开数据集大多用于比赛,由主办方和数据提供者选择,与临床环境中使用的数据存在较大差异,不利于模型性能和泛化能力的测试和验证。因此,我们发布了来自中国赣州市立医院临床环境的数据集GZMH。数据集具有以下特征:(1)数据集包含1534张RGB通道电子图像,分辨率为2084x2084像素,有丝分裂区域为2355个;(2)病例数据量大,类型丰富,数据特征更接近临床应用场景。(3)数据集提供了像素级语义分割标签和目标检测标签(核分割图像区域的最小外接矩形坐标和质心坐标),方便用于研究。欢迎引用针对GZMH数据集的研究论文:[1]Huadeng Wang, Hao Xu, Bingbing Li*, Xipeng Pan*, Lingqi Zeng, Rushi Lan, Xiaonan Luo. A novel dataset and a two-stage mitosis nuclei detection method based on hybrid anchor branch, Biomedical Signal Processing and Control, Volume 87, Part A, 2024,105374. (中科院二区SCI期刊,IF 5.1)[2]Huadeng Wang, Zhipeng Liu, Xipeng Pan, Kang Yu, Rushi Lan, Junlin Guan, Bingbing Li, A novel dataset and a two-stage deep learning method for breast cancer mitosis nuclei identification, Digital Signal Processing, Volume 158, 2025, 104978. (JCR二区、中科院三区SCI期刊,IF 2.9)[3]汪华登, 王雪馨, 黎兵兵*, 刘志鹏, 许浩, 潘细朋, 蓝如师, 罗笑南. GZMH:用于有丝分裂细胞核检测和分割的乳腺癌病理图像数据集[J]. 中国图象图形学报, 2024, 29(3). (CCF B类中文核心期刊)二、更新后的第二版数据集GZMH-V2的说明建议感兴趣的研究人员,尽量选用GZMH-V2数据集(即所上传的GZMH Dataset V2.zip)。由于在临床应用中,对于乳腺癌恶性程度大、分化程度高的情况,乳腺癌组织学分级分数主要依靠腺管形成比例、细胞核的多形性这两个参数就可以基本确定为3级;对于乳腺癌恶性程度小、分化程度低的情况,前两个参数不能充分进行判定,就需要结合核分裂象计数,帮助临床医生识别最终分级是2级还是3级。因此,我们在GZMH数据集的基础上,根据对一张WSI切割成的含有核分裂象数最多的前10张HPF进行计数评分,对测试集进行了划分,并随机筛选,生成test_1or2_score和test_3_score两个测试集。

I. Description of the first version of GZMH dataset Mitotic nuclei counting is one of the three critical scoring metrics for breast cancer diagnosis and histological grading. Currently, this task relies on pathologists manually reviewing and counting mitotic nuclei on whole slide images (WSIs) by examining hotspot regions under a microscope, which is extremely time-consuming and labor-intensive. Deep learning-based automatic detection methods can effectively assist clinicians in identifying and counting mitotic nuclei in breast pathological images. However, most public datasets used in current research are designed for competitions, selected by organizers and data providers, and differ significantly from data used in real clinical settings, which hinders the testing and validation of model performance and generalization ability. Therefore, we release the GZMH dataset collected from the clinical environment of Ganzhou Municipal Hospital in China. The dataset has the following characteristics: (1) The dataset contains 1,534 RGB electronic images with a resolution of 2084×2084 pixels, and a total of 2,355 mitotic regions are annotated; (2) The dataset has a large volume of cases with diverse types, and its data characteristics are more aligned with real clinical application scenarios; (3) The dataset provides both pixel-level semantic segmentation labels and object detection labels (including coordinates of the minimum bounding rectangle and centroid of each segmented nuclear region), which facilitates related research. Citation of studies based on the GZMH dataset is welcome: [1] Huadeng Wang, Hao Xu, Bingbing Li*, Xipeng Pan*, Lingqi Zeng, Rushi Lan, Xiaonan Luo. A novel dataset and a two-stage mitosis nuclei detection method based on hybrid anchor branch, Biomedical Signal Processing and Control, Volume 87, Part A, 2024, 105374. (Chinese Academy of Sciences (CAS) Zone 2 SCI journal, Impact Factor (IF) = 5.1) [2] Huadeng Wang, Zhipeng Liu, Xipeng Pan, Kang Yu, Rushi Lan, Junlin Guan, Bingbing Li. A novel dataset and a two-stage deep learning method for breast cancer mitosis nuclei identification, Digital Signal Processing, Volume 158, 2025, 104978. (JCR Zone 2 and CAS Zone 3 SCI journal, IF = 2.9) [3] Huadeng Wang, Xuexin Wang, Bingbing Li*, Zhipeng Liu, Hao Xu, Xipeng Pan, Rushi Lan, Xiaonan Luo. GZMH: A breast cancer histopathological image dataset for mitotic nuclei detection and segmentation[J]. Journal of Image and Graphics, 2024, 29(3). (CCF B-class Chinese core journal) II. Description of the updated second version of GZMH-V2 dataset Researchers who are interested are recommended to use the GZMH-V2 dataset (i.e., the uploaded GZMH Dataset V2.zip) whenever possible. In clinical practice, for breast cancer with high malignancy and high differentiation, the histological grade can be basically determined as grade 3 mainly based on two parameters: the proportion of glandular formation and nuclear pleomorphism; for breast cancer with low malignancy and low differentiation, the first two parameters are insufficient for accurate grading, so mitotic count needs to be incorporated to help clinicians determine whether the final grade is 2 or 3. Therefore, based on the original GZMH dataset, we divided the test set by counting and scoring the top 10 high-power fields (HPFs) with the highest mitotic count cropped from each WSI, and randomly screened to generate two test sets: test_1or2_score and test_3_score.
提供机构:
Xipeng Pan; Rushi Lan; Huadeng Wang; Guilin University of Electronic Technology; Bingbing Li; Xuexin Wang; Xiaonan Luo; Zhipeng Liu
创建时间:
2023-05-29
二维码
社区交流群
二维码
科研交流群
商业服务