CPIA Dataset_Part02: A Comprehensive Pathological Image Analysis Dataset for Self-supervised Learning Pre-training

Name: CPIA Dataset_Part02: A Comprehensive Pathological Image Analysis Dataset for Self-supervised Learning Pre-training
Creator: Science Data Bank
Published: 2025-04-27 21:52:22
License: 暂无描述

DataCite Commons2025-04-27 更新2025-05-18 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=1d693905f1194e558a54bd3d1c9f6047

下载链接

链接失效反馈

官方服务：

资源简介：

Pathological image analysis is a crucial field in computer-aided diagnosis. Transfer learning using models initialized on natural images has improved the downstream pathological performance. However, the lack of sophisticated domain-specific pathological initialization hinders their potential. Self-supervised learning (SSL) enables pre-training without sample-level labels, overcoming the challenge of expensive annotations. Thus, this field calls for a comprehensive dataset, similar to the ImageNet in computer vision. This paper presents a large-scale comprehensive pathological image analysis (CPIA) dataset for SSL pre-training. The CPIA dataset contains 148,962,579 images, covering over 48 organs/tissues and approximately 100 kinds of diseases, which includes two main data types: whole slide images (WSIs) and characteristic regions of interest (ROIs). And we establish a multi-scale pathological data processing workflow, combined with the diagnosis habits of senior pathologists. The CPIA dataset facilitates a comprehensive pathological understanding and enables pattern discovery explorations. Additionally, to launch the CPIA dataset, several state-of-the-art (SOTA) baselines of SSL pre-training and downstream evaluation are specially conducted. This is the Part02 of CPIA dataset, including the CPIA-Mini and partial CPIA dataset. The related code and information are available at https://github.com/zhanglab2021/CPIA_Dataset.

提供机构：

Science Data Bank

创建时间：

2024-03-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集