CPIA Dataset_Part04: A Comprehensive Pathological Image Analysis Dataset for Self-supervised Learning Pre-training
收藏科学数据银行2024-04-16 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=8e003efabab74c1699b3fd297b5e7da9
下载链接
链接失效反馈官方服务:
资源简介:
Pathological image analysis is a crucial field in computer-aided diagnosis. Transfer learning using models initialized on natural images has improved the downstream pathological performance. However, the lack of sophisticated domain-specific pathological initialization hinders their potential. Self-supervised learning (SSL) enables pre-training without sample-level labels, overcoming the challenge of expensive annotations. Thus, this field calls for a comprehensive dataset, similar to the ImageNet in computer vision. This paper presents a large-scale comprehensive pathological image analysis (CPIA) dataset for SSL pre-training. The CPIA dataset contains 148,962,579 images, covering over 48 organs/tissues and approximately 100 kinds of diseases, which includes two main data types: whole slide images (WSIs) and characteristic regions of interest (ROIs). And we establish a multi-scale pathological data processing workflow, combined with the diagnosis habits of senior pathologists. The CPIA dataset facilitates a comprehensive pathological understanding and enables pattern discovery explorations. Additionally, to launch the CPIA dataset, several state-of-the-art (SOTA) baselines of SSL pre-training and downstream evaluation are specially conducted. This is the Part04 of CPIA dataset, including the CPIA-Mini and partial CPIA dataset. The related code and information are available at https://github.com/zhanglab2021/CPIA_Dataset.
提供机构:
Yunlu Feng; Guanglei Zhang; Tianyi Zhang; Peking Union Medical College Hospital; Zeyu Liu; Yanli Lei; Sicheng Chen; Shangqing Lyu; Beihang University
创建时间:
2023-12-27



