H&E and immunohistochemical stain images of 209 cases of diffuse large B-cell lymphoma linked with cytogenetic features and clinical outcomes
收藏DataCite Commons2025-06-01 更新2025-04-16 收录
下载链接:
https://www.cancerimagingarchive.net/collection/dlbcl-morphology/
下载链接
链接失效反馈官方服务:
资源简介:
Diffuse Large B-Cell Lymphoma (DLBCL) is the most common non-Hodgkin lymphoma worldwide. DLBCL is fatal without treatment, but early detection and therapy can cure up to 70% of patients. The current best prognostic classification, the National Comprehensive Cancer Network International Prognostic Index, is insufficient to guide therapeutic decision-making for individual patients. No tumor-intrinsic prognostication method is currently available.
The DLBCL-Morph dataset contains 42 digital high-magnification scans of tissue microarrays (TMAs) containing tissue cores from 209 DLBCL cases at Stanford Hospital. Each DLBCL case is accompanied by survival data, follow-up status and a wide variety of clinical and cytogenetic variables. The TMAs are stained for H&E, which shows cell morphology, as well as for the expression of several prognostically relevant proteins: CD10, BCL6, MUM1, BCL2, and MYC. The TMAs are accompanied by pathologist-annotated regions of interest (ROIs) that specify areas of tissue representative of DLBCL. We used deep learning to segment out cancerous nuclei from the ROIs, and computed several geometric features for each cancerous nucleus, which are provided as part of our dataset. These geometric features quantify several morphologic properties of a nucleus, such as size and elongation, and can be used as input for automated prognostic models to predict survival. In addition, DLBCL-Morph contains 204 digital high-magnification whole-slide images (WSIs) from 149 DLBCL cases, stained for H&E.
A total of 152,194 patches (240x240 pixels each) were extracted from the H&E stained ROIs and a HoVer-Net model was used to segment tumor nuclei (1,035,909 binary masks). Geometric descriptors were computed for each segmented nucleus and a Cox proportional hazards model was evaluated using A) only clinical features, B) only morphologic features, or C) both sets of features. The Cox model achieved a concordance index of A) 0.703 (p = 0.005) B) 0.645 (p = 0.07), and C) 0.723 (p < 0.001) on a randomly sampled validation set of 51 patients. Our findings suggest that a risk calculator based on both clinical and morphologic data could yield improved prognostic value for DLBCL without the need for additional diagnostic testing.
Several studies have thus far failed to conclusively demonstrate that morphologic classification can predict outcomes in DLBCL. Automated medical imaging methods on whole slide images (WSI) could potentially identify novel, prognostically significant morphological or immunohistochemical biomarkers. The ability of automated methods to identify prognostically relevant features on H&E sections that have eluded pathologists has been demonstrated (Beck et al Science Translational Med). Furthermore, if successful, automated image analysis could potentially be scaled up into a cost-effective alternative to current classifcation methods which are typically costly and/or labor intensive. A critical requirement for the development of such deep learning models is the availability of datasets containing WSIs appropriately stained to show cell morphology and oncogene expression, with accompanying prognostic outcome data.
弥漫性大B细胞淋巴瘤(Diffuse Large B-Cell Lymphoma, DLBCL)是全球范围内最常见的非霍奇金淋巴瘤。未经治疗的DLBCL具有致死性,但早期诊断与治疗可使多达70%的患者获得治愈。当前最优的预后分类系统——美国国家综合癌症网络国际预后指数(National Comprehensive Cancer Network International Prognostic Index)——尚不足以指导个体化患者的治疗决策,目前尚无基于肿瘤固有特征的预后预测方法。
DLBCL-Morph数据集包含斯坦福医院209例DLBCL患者样本的42张数字化高倍扫描组织微阵列(tissue microarrays, TMAs)图像,每张阵列包含患者的组织芯样本。每例DLBCL样本均配套生存数据、随访状态以及多种临床与细胞遗传学变量。该TMAs样本接受了苏木精-伊红染色(Hematoxylin-Eosin, H&E)以显示细胞形态,同时还针对多种预后相关蛋白的表达进行了染色,包括CD10、BCL6、MUM1、BCL2及MYC。该TMAs样本还配有病理学家标注的感兴趣区域(Regions of Interest, ROIs),用于指定代表DLBCL的典型组织区域。本研究通过深度学习算法对ROIs中的癌性细胞核进行分割,并为每个癌性细胞核计算了多种几何特征,这些特征均作为数据集的一部分予以公开。这些几何特征可量化细胞核的多项形态学属性,如尺寸与伸长程度,可作为自动化预后模型的输入以预测患者生存情况。此外,DLBCL-Morph数据集还包含149例DLBCL患者的204张数字化高倍全切片图像(Whole-Slide Images, WSIs),均经过H&E染色。
研究团队从H&E染色的ROIs中共提取了152194张图像块(每张尺寸为240×240像素),并使用HoVer-Net模型对肿瘤细胞核进行分割,共生成1035909张二值掩码。研究为每个分割后的细胞核计算了几何描述符,并采用仅使用临床特征、仅使用形态学特征以及同时使用两类特征的三种方案对Cox比例风险模型(Cox proportional hazards model)进行评估。在随机抽取的51例患者组成的验证集上,该Cox模型的一致性指数分别为:A) 0.703(p=0.005)、B) 0.645(p=0.07)以及C) 0.723(p<0.001)。本研究结果表明,基于临床与形态学数据的风险计算器可提升DLBCL的预后评估价值,且无需额外的诊断检测。
迄今为止,多项研究均未能确凿证实形态学分类可预测DLBCL患者的预后结局。基于全切片图像的自动化医学影像方法有望识别出具有预后价值的新型形态学或免疫组织化学生物标志物。已有研究证实,自动化方法可识别出病理学家未能察觉的H&E染色切片上的预后相关特征(Beck等,《科学·转化医学》)。此外,若该技术取得成功,自动化图像分析有望发展为一种具有成本效益的替代方案,可替代当前通常成本高昂且/或劳动密集型的分类方法。开发此类深度学习模型的关键前提之一,是获取包含经过适当染色以显示细胞形态与癌基因表达的WSIs、且配套有预后结局数据的数据集。
提供机构:
The Cancer Imaging Archive
创建时间:
2022-03-25
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个针对弥漫大B细胞淋巴瘤(DLBCL)的多模态病理图像集合,包含209个病例的H&E和免疫组化染色图像,并关联了临床结果、细胞遗传学特征及路径师标注信息。数据集通过深度学习分割癌细胞核并提取几何特征,结合临床数据构建预后模型,显示联合特征能显著提高预测性能,旨在开发自动化、成本效益高的预后工具以改善DLBCL患者管理。
以上内容由遇见数据集搜集并总结生成



