five

RuiPath病理视觉基础模型及评测数据集开源说明

收藏
魔搭社区2026-05-19 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/Ruijin_Hospital/RuiPath_Open_Source_Intro
下载链接
链接失效反馈
官方服务:
资源简介:
## RuiPath病理视觉基础模型及评测数据集开源说明 ### 关于RuiPath RuiPath瑞智病理模型是上海交通大学医学院附属瑞金医院基于华为DSC AI解决方案研发的多模态病理大模型。2025年6月,我们将RuiPath病理模型的核心:RuiPath病理视觉基础模型v1.0(RuiPath_VisionFoundation_V1.0)及RuiPath病理泛癌种评测WSI数据集V1.0(RuiPath_Pan_Cancer_Benchmark_WSI_DataSet-V1.0)进行开源发布。您可通过以下内容了解此次开源的内容,并下载【数据集文件】页面中的使用申请表,发送至contact@shdmic.com申请访问并下载开源模型、评测数据集。 为获得使用审批,请使用您最新就职单位的机构邮箱填写、发送申请表,您必须确保提供的信息真实、准确、有效。在通过审批后,您不得分发、发布或复制RuiPath模型和测试数据集。如果您所在组织内的其他人员希望使用RuiPath模型和测试数据集,须注册新的账号并单独申请使用审批。用户不得以任何形式尝试重新识别用于开发RuiPath模型的已脱敏数据。 ### RuiPath病理视觉基础模型V1.0介绍 RuiPath-VisionFoundation-V1.0 是一个基于 [ViT](https://arxiv.org/abs/2010.11929) 模型结构,通过 [DINOv2](https://github.com/facebookresearch/dinov2) 训练方式在瑞金百万规模病理 WSI图像数据基础上进行自监督学习训练。该数据集覆盖在中国全癌种发病人数 90% 的癌种。 #### ViT 模型介绍 模型结构如下: ``` { "architecture": "vit_large_patch16_224", "patch_size": 16, "img_size": 224, "num_classes": 0, "num_features": 1024, "global_pool": "token" } ``` ### RuiPath-VisionFoundation-V1.0 特征抽取能力评估 测试使用的 12 个计算病理学第三方开源数据集: #### BACH 数据集 - **基本信息:** BACH 是乳腺癌组织学图像数据集,包含 400 张显微镜图像,分为正常、良性、原位癌和浸润性癌 4 类。 #### BCNB 数据集 - **基本信息:** 早期乳腺癌空心针穿刺活检 WSI 数据集,包括 WSI 和临床数据。 #### BRACS 和 BRACS ROI 数据集 - **基本信息:** BRACS 包含 547 张 WSI 和 4,539 个 ROI,用于乳腺癌亚型分类。 #### CPTAC_COAD 数据集 - **基本信息:** CPTAC 的结肠腺癌 WSI 数据集。 #### LC25000 Colon 和 LC25000 Lung 数据集 - **基本信息:** LC25000 数据集的一部分,分别包含结肠癌和肺癌的 5,000 张图像。 #### MHIST 数据集 - **基本信息:** 小型组织病理学图像分析数据集,3,152 张图像用于结直肠息肉分类。 #### NATBRCA 数据集 - **基本信息:** 新辅助化疗后的乳腺癌数据集。 #### Pan-Nuke 数据集 - **基本信息:** 泛癌核分割和分类数据集,涵盖 19 种组织类型。 #### PatchCamelyon 数据集 - **基本信息:** 用于检测乳腺癌淋巴结转移的数据集。 #### Renal_cell_binary_lymphocytes 数据集 - **基本信息:** 可能用于肾细胞癌的二分类任务,涉及淋巴细胞浸润。 #### WSSS4LUAD 数据集 - **基本信息:** 用于肺腺癌的弱监督语义分割数据集。 ### License [CC-BY-NC-SA](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.zh-hans) ### RuiPath病理泛癌种评测WSI数据集V1.0 本数据集是为肿瘤病理诊断及AI模型训练而构建的高质量专业数据集,具备以下核心特征: 1、多病种覆盖:涵盖乳腺癌、肺癌、结直肠癌等高发肿瘤的数字化全切片图像(Whole Slide Images, WSIs),并附带经严格质控的切片级病理诊断标签; 2、权威数据来源:所有样本均来自瑞金医院伦理委员会批准的临床活检或手术切除标本,确保数据合规性与临床代表性; 3、专家级标注:由资深病理医师团队完成诊断标签标注(包括肿瘤有无、组织学亚型分类、肿瘤分级等关键病理诊断指标),标注过程遵循国内外病理诊断报告规范; 4、临床场景适配性:数据筛选策略模拟真实诊断流程,并经过标准化预处理; 本数据集包含 700 张高质量带计算病理学下游任务标签的 WSI 图像,涵盖下面 7 个病种: - 乳腺癌 - 结直肠癌 - 甲状腺癌 - 胃癌 - 胰腺癌 - 前列腺癌 - 肺癌 ### License [CC-BY-NC-ND](https://creativecommons.org/licenses/by-nc-nd/4.0/deed.zh-hans)

# Open Source Description of the RuiPath Pathology Vision Foundation Model and Evaluation Dataset ### About RuiPath The RuiPath pathology model is a multimodal pathology foundation model developed by Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine based on Huawei's DSC AI solution. In June 2025, we open-sourced the core components of the RuiPath model: the RuiPath Pathology Vision Foundation Model v1.0 (RuiPath_VisionFoundation_V1.0) and the RuiPath Pan-Cancer Benchmark WSI Dataset V1.0 (RuiPath_Pan_Cancer_Benchmark_WSI_DataSet-V1.0). You can learn about the content of this open-source release through the following sections, download the application form for use from the [Dataset Files] page, and send it to contact@shdmic.com to apply for access and download the open-source model and benchmark dataset. To obtain usage approval, please fill out and send the application form using the institutional email address of your current employer. You must ensure that the information provided is true, accurate, and valid. Upon passing the approval, you shall not distribute, publish, or reproduce the RuiPath model and the benchmark dataset. If other personnel in your organization wish to use the RuiPath model and benchmark dataset, they must register a new account and apply for usage approval separately. Users shall not attempt to re-identify the de-identified data used for developing the RuiPath model in any form. ### Introduction to RuiPath Pathology Vision Foundation Model V1.0 RuiPath-VisionFoundation-V1.0 is a model based on the [ViT](https://arxiv.org/abs/2010.11929) architecture, trained via self-supervised learning using the [DINOv2](https://github.com/facebookresearch/dinov2) approach on a dataset of 1 million pathology WSI images from Ruijin Hospital. This dataset covers cancer types accounting for 90% of the total cancer incidence in China. #### Introduction to ViT Architecture The model architecture is as follows: { "architecture": "vit_large_patch16_224", "patch_size": 16, "img_size": 224, "num_classes": 0, "num_features": 1024, "global_pool": "token" } ### Evaluation of Feature Extraction Capability of RuiPath-VisionFoundation-V1.0 12 third-party open-source computational pathology datasets used for testing: #### BACH Dataset - **Basic Information:** BACH is a breast cancer histopathology image dataset containing 400 microscopic images, divided into 4 categories: normal, benign, carcinoma in situ, and invasive carcinoma. #### BCNB Dataset - **Basic Information:** Early breast cancer core needle biopsy WSI dataset, including WSIs and clinical data. #### BRACS and BRACS ROI Datasets - **Basic Information:** BRACS contains 547 WSIs and 4,539 ROIs, used for breast cancer subtyping. #### CPTAC_COAD Dataset - **Basic Information:** Colon adenocarcinoma WSI dataset from CPTAC. #### LC25000 Colon and LC25000 Lung Datasets - **Basic Information:** Parts of the LC25000 dataset, containing 5,000 images each for colon cancer and lung cancer, respectively. #### MHIST Dataset - **Basic Information:** Small histopathology image analysis dataset with 3,152 images for colorectal polyp classification. #### NATBRCA Dataset - **Basic Information:** Breast cancer dataset after neoadjuvant chemotherapy. #### Pan-Nuke Dataset - **Basic Information:** Pan-cancer nuclear segmentation and classification dataset covering 19 tissue types. #### PatchCamelyon Dataset - **Basic Information:** Dataset for detecting breast cancer lymph node metastases. #### Renal_cell_binary_lymphocytes Dataset - **Basic Information:** Likely used for binary classification tasks of renal cell carcinoma involving lymphocyte infiltration. #### WSSS4LUAD Dataset - **Basic Information:** Weakly supervised semantic segmentation dataset for lung adenocarcinoma. ### License [CC-BY-NC-SA](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.zh-hans) ### RuiPath Pan-Cancer Benchmark WSI Dataset V1.0 This dataset is a high-quality professional dataset constructed for tumor pathological diagnosis and AI model training, with the following core characteristics: 1. Multi-cancer Coverage: Covers digitized whole slide images (WSIs) of high-incidence tumors such as breast cancer, lung cancer, colorectal cancer, etc., with slide-level pathological diagnosis labels that have undergone strict quality control; 2. Authoritative Data Source: All samples are derived from clinical biopsy or surgical resection specimens approved by the Ethics Committee of Ruijin Hospital, ensuring data compliance and clinical representativeness; 3. Expert-level Annotation: Diagnostic labels (including key pathological diagnostic indicators such as tumor presence/absence, histological subtyping, tumor grading, etc.) are annotated by a team of senior pathologists, and the annotation process follows domestic and international pathological diagnosis report standards; 4. Clinical Scene Adaptability: The data screening strategy simulates real diagnostic procedures and has undergone standardized preprocessing; This dataset contains 700 high-quality WSI images with computational pathology downstream task labels, covering the following 7 cancer types: - Breast Cancer - Colorectal Cancer - Thyroid Cancer - Gastric Cancer - Pancreatic Cancer - Prostate Cancer - Lung Cancer ### License [CC-BY-NC-ND](https://creativecommons.org/licenses/by-nc-nd/4.0/deed.zh-hans)
提供机构:
maas
创建时间:
2025-06-29
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
RuiPath病理视觉基础模型及评测数据集是一个多模态大规模病理模型和评估数据集,覆盖90%的中国癌症类型,包含700张高质量WSI图像和7种常见癌症的专业标注数据。数据集由瑞金医院开发,基于华为DSC AI解决方案,适用于肿瘤病理诊断和AI模型训练。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务