five

GZMH dataset for mitotic nuclei identification of breast cancer pathological images

收藏
DataCite Commons2025-04-27 更新2025-04-16 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=da31601544ce42ec8badf85b689f64d2
下载链接
链接失效反馈
官方服务:
资源简介:
一、数据集GZMH第一版说明有丝分裂细胞核计数是乳腺癌诊断和组织学分级的三个重要评分指标之一。目前它依赖于病理学家通过显微镜观察病理图像热点区域,从WSI对有丝分裂细胞核进行阅片和计数,非常耗时和费力。基于深度学习的自动检测方法可以有效帮助医生识别和计数乳腺病理图像中的有丝分裂细胞核。然而,目前研究中的公开数据集大多用于比赛,由主办方和数据提供者选择,与临床环境中使用的数据存在较大差异,不利于模型性能和泛化能力的测试和验证。因此,我们发布了来自中国赣州市立医院临床环境的数据集GZMH。数据集具有以下特征:(1)数据集包含1534张RGB通道电子图像,分辨率为2084x2084像素,有丝分裂区域为2355个;(2)病例数据量大,类型丰富,数据特征更接近临床应用场景。(3)数据集提供了像素级语义分割标签和目标检测标签(核分割图像区域的最小外接矩形坐标和质心坐标),方便用于研究。欢迎引用针对GZMH数据集的研究论文:[1]Huadeng Wang, Hao Xu, Bingbing Li*, Xipeng Pan*, Lingqi Zeng, Rushi Lan, Xiaonan Luo. A novel dataset and a two-stage mitosis nuclei detection method based on hybrid anchor branch, Biomedical Signal Processing and Control, Volume 87, Part A, 2024,105374. (中科院二区SCI期刊,IF 5.1)[2]Huadeng Wang, Zhipeng Liu, Xipeng Pan, Kang Yu, Rushi Lan, Junlin Guan, Bingbing Li, A novel dataset and a two-stage deep learning method for breast cancer mitosis nuclei identification, Digital Signal Processing, Volume 158, 2025, 104978. (JCR二区、中科院三区SCI期刊,IF 2.9)[3]汪华登, 王雪馨, 黎兵兵*, 刘志鹏, 许浩, 潘细朋, 蓝如师, 罗笑南. GZMH:用于有丝分裂细胞核检测和分割的乳腺癌病理图像数据集[J]. 中国图象图形学报, 2024, 29(3). (CCF B类中文核心期刊)二、更新后的第二版数据集GZMH-V2的说明建议感兴趣的研究人员,尽量选用GZMH-V2数据集(即所上传的GZMH Dataset V2.zip)。由于在临床应用中,对于乳腺癌恶性程度大、分化程度高的情况,乳腺癌组织学分级分数主要依靠腺管形成比例、细胞核的多形性这两个参数就可以基本确定为3级;对于乳腺癌恶性程度小、分化程度低的情况,前两个参数不能充分进行判定,就需要结合核分裂象计数,帮助临床医生识别最终分级是2级还是3级。因此,我们在GZMH数据集的基础上,根据对一张WSI切割成的含有核分裂象数最多的前10张HPF进行计数评分,对测试集进行了划分,并随机筛选,生成test_1or2_score和test_3_score两个测试集。

1. GZMH Dataset Version 1 Description Mitotic nuclei counting is one of the three key scoring indicators for breast cancer diagnosis and histological grading. Currently, this task relies on pathologists manually reviewing and counting mitotic nuclei in hotspots of pathological images scanned from whole-slide images (WSIs) via microscopy, which is extremely time-consuming and labor-intensive. Deep learning-based automatic detection methods can effectively assist clinicians in identifying and counting mitotic nuclei in breast pathological images. However, most public datasets used in current research are designed for competitions, selected by organizers and data providers, and differ significantly from data used in real clinical settings, which hinders the testing and validation of model performance and generalization ability. Therefore, we release the GZMH dataset collected from the clinical environment of Ganzhou Municipal Hospital, China. The dataset has the following characteristics: (1) The dataset contains 1,534 RGB electronic images with a resolution of 2084 × 2084 pixels, and a total of 2,355 mitotic regions; (2) The dataset has a large volume and diverse types of cases, with data characteristics closer to real clinical application scenarios; (3) The dataset provides pixel-level semantic segmentation labels and object detection labels (including the minimum bounding rectangle coordinates and centroid coordinates of the segmented nuclear image regions), facilitating related research. Citation of research papers related to the GZMH dataset is welcome: [1] Huadeng Wang, Hao Xu, Bingbing Li*, Xipeng Pan*, Lingqi Zeng, Rushi Lan, Xiaonan Luo. A novel dataset and a two-stage mitosis nuclei detection method based on hybrid anchor branch, Biomedical Signal Processing and Control, Volume 87, Part A, 2024, 105374. (Chinese Academy of Sciences (CAS) Zone 2 SCI journal, Impact Factor (IF) = 5.1) [2] Huadeng Wang, Zhipeng Liu, Xipeng Pan, Kang Yu, Rushi Lan, Junlin Guan, Bingbing Li. A novel dataset and a two-stage deep learning method for breast cancer mitosis nuclei identification, Digital Signal Processing, Volume 158, 2025, 104978. (JCR Zone 2 and CAS Zone 3 SCI journal, IF = 2.9) [3] Huadeng Wang, Xuexin Wang, Bingbing Li*, Zhipeng Liu, Hao Xu, Xipeng Pan, Rushi Lan, Xiaonan Luo. GZMH: A Breast Cancer Histopathological Image Dataset for Mitotic Nuclei Detection and Segmentation[J]. Journal of Image and Graphics, 2024, 29(3). (CCF B Class Chinese Core Journal) 2. Description of Updated GZMH Dataset Version 2 Interested researchers are recommended to use the GZMH-V2 dataset (i.e., the uploaded GZMH Dataset V2.zip) whenever possible. In clinical practice, for breast cancer with high malignancy and high differentiation, the histological grade score can basically be determined as grade 3 mainly based on two parameters: tubular formation ratio and nuclear pleomorphism. For breast cancer with low malignancy and low differentiation, the first two parameters cannot fully determine the grade, so mitotic count needs to be combined to help clinicians identify whether the final grade is grade 2 or grade 3. Therefore, based on the original GZMH dataset, we divided the test set and randomly screened two test sets, namely test_1or2_score and test_3_score, by counting and scoring the top 10 high-power fields (HPFs) with the highest number of mitotic figures cropped from a single WSI.
提供机构:
Science Data Bank
创建时间:
2023-05-30
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
GZMH数据集是一个用于乳腺癌病理图像中有丝分裂细胞核识别的临床数据集,包含1534张高分辨率图像和丰富的标注信息,支持深度学习方法的研究和验证。第二版GZMH-V2进一步细化了测试集,以适应不同恶性程度乳腺癌的研究需求。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务