Table 4_Machine learning model for predicting epidermal growth factor receptor expression status in breast cancer using ultrasound radiomics.xlsx
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Table_4_Machine_learning_model_for_predicting_epidermal_growth_factor_receptor_expression_status_in_breast_cancer_using_ultrasound_radiomics_xlsx/30381700
下载链接
链接失效反馈官方服务:
资源简介:
Background/objectivesThe epidermal growth factor receptor (EGFR) is a clinically important target, as its expression in patients with breast cancer influences both overall and disease-free survival. Current methods for assessing EGFR expression status in a patient are invasive. Therefore, in this study, we developed a machine learning-based approach utilizing ultrasound radiomics to non-invasively predict EGFR expression status in patients with breast cancer.
MethodsRadiomic features were extracted from grayscale and wavelet-transformed ultrasound images of 321 patients. The dataset was randomly split into training (n = 225) and test (n = 96) sets at a 7:3 ratio with stratified sampling to preserve the EGFR+/– ratio. Key predictors were identified using a multi-step procedure—including reproducibility filtering (ICC > 0.75), univariate F-test filtering (p < 0.05), and L1-regularized selection via LASSO regression. Seven machine-learning models were trained. Model interpretability was assessed using SHAP (Shapley Additive Explanations). In addition to the hold-out evaluation, we performed stratified 10-fold cross-validation to reduce selection bias.
ResultsThe random forest model demonstrated the optimal performance, with an area under the receiver operating characteristic curve of 0.86 in the training set and 0.70 in the test set. It significantly outperformed the other models (P < 0.001). The Shapley additive explanation method was used to interpret the model, revealing that original_ngtdm_Coarseness, original_ngtdm_Strength, and wavelet.LL_glcm_ClusterProminence were the top predictors. These features reflect structural compactness and heterogeneity associated with EGFR overexpression.
ConclusionsWe present a reliable and interpretable tool for non-invasively assessing EGFR expression status in patients with breast cancer. The most important predictors captured tumor heterogeneity and microstructural uniformity, highlighting the biological relevance of radiomic patterns in EGFR-positive tumors. This model integrates advanced imaging analyses with machine learning, underscoring the potential of radiomics to advance precision oncology.
背景与目标:表皮生长因子受体(epidermal growth factor receptor, EGFR)是临床重要靶点,其在乳腺癌患者中的表达水平会同时影响总生存期与无病生存期。当前用于评估患者EGFR表达状态的方法均具有侵入性。因此本研究开发了一种基于机器学习的超声放射组学(ultrasound radiomics)方法,以无创预测乳腺癌患者的EGFR表达状态。
方法:从321例患者的灰度超声图像及小波变换超声图像中提取放射组学特征。采用分层抽样按7:3的比例将数据集随机划分为训练集(n=225)与测试集(n=96),以保留EGFR阳性/阴性的比例。通过多步骤流程筛选关键预测因子:包括可重复性过滤(组内相关系数intraclass correlation coefficient, ICC>0.75)、单变量F检验过滤(p<0.05),以及基于LASSO回归的L1正则化特征选择。共训练了7种机器学习模型。使用SHAP(Shapley Additive Explanations)评估模型可解释性。除留出法评估外,本研究还开展了分层10折交叉验证以降低选择偏倚。
结果:随机森林(random forest)模型展现出最优性能,其训练集与测试集的受试者工作特征曲线下面积(area under the receiver operating characteristic curve, AUC)分别为0.86与0.70,显著优于其余所有模型(P<0.001)。采用SHAP方法对模型进行解读,结果显示original_ngtdm_Coarseness、original_ngtdm_Strength以及wavelet.LL_glcm_ClusterProminence为排名前三的预测因子。这些特征反映了与EGFR过表达相关的结构紧密度与肿瘤异质性。
结论:本研究提出了一种可靠且可解释的工具,可无创评估乳腺癌患者的EGFR表达状态。核心预测因子捕获了肿瘤异质性与微观结构均匀性,凸显了放射组学特征在EGFR阳性肿瘤中的生物学相关性。本模型将先进影像分析与机器学习相结合,彰显了放射组学在推动精准肿瘤学(precision oncology)发展中的应用潜力。
创建时间:
2025-10-17



