five

Table 1_Shapley additive explanations based feature selection reveals CXCL14 as a key immune-related gene in predicting idiopathic pulmonary fibrosis.xlsx

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Table_1_Shapley_additive_explanations_based_feature_selection_reveals_CXCL14_as_a_key_immune-related_gene_in_predicting_idiopathic_pulmonary_fibrosis_xlsx/29833742
下载链接
链接失效反馈
官方服务:
资源简介:
BackgroundIdiopathic pulmonary fibrosis (IPF) is a progressive lung disease marked by excessive fibrous tissue accumulation in the lung interstitium, leading to a gradual deterioration of respiratory function and significantly impairing patients’ quality of life. Despite advances in understanding its etiology and pathogenesis, the exact mechanisms remain unclear, underscoring the need for novel biomarkers and therapeutic targets. MethodsWe analyzed five publicly available datasets from the Gene Expression Omnibus (GEO), specifically “GSE15197,” “GSE53845,” “GSE135065,” “GSE185691,” and “GSE195770,” to identify gene expression changes associated with IPF. Data were annotated and normalized to minimize batch effects and technical variability. Principal Component Analysis (PCA) verified preprocessing efficacy. Differentially expressed genes (DEGs) were identified using linear modeling. Core DEGs were selected via integrative analysis across datasets. ResultsOur analysis revealed DEGs that are substantially linked to crucial biological processes such as extracellular matrix organization and immune response regulation. Integrative analysis of five GEO datasets identified CXCL14, MMP7, and MDK as core differentially expressed genes in the final predictive model. Using Least Absolute Shrinkage and Selection Operator (LASSO) regression and Random Forest, we constructed a logistic regression model with robust predictive performance, achieving an AUC of 0.92 in the training cohort and 0.89 in the validation cohort, with sensitivity of 88% and specificity of 85%. The Shapley Additive Explanations (SHAP) method identified CXCL14 (mean SHAP value = 0.38) as the most influential feature, followed by MMP7 and MDK. Functional enrichment analyses highlighted significant enrichment of TGF-β signaling, extracellular matrix organization, and chemokine signaling pathways. Immune infiltration analysis revealed positive correlations between CXCL14 expression and alveolar macrophage/activated fibroblast populations, while SHAP interaction analysis identified synergistic effects between CXCL14 and TGF-β1 in driving fibrosis. ConclusionThese findings substantiate the hypothesis that IPF pathogenesis is closely linked to extracellular matrix remodeling and immune dysregulation. This suggests that future investigations should delve deeper into the practical applications of identified biomarkers in the early diagnosis and management of IPF. Furthermore, the machine learning-based predictive model demonstrates strong clinical potential and merits further validation in prospective trials to assess its utility and therapeutic implications in real-world settings.

特发性肺纤维化(Idiopathic Pulmonary Fibrosis, IPF)是一种进行性肺部疾病,以肺间质内过量纤维组织堆积为特征,可导致呼吸功能逐渐衰退,严重降低患者生活质量。尽管目前对其病因与发病机制的研究已有进展,但确切的发病机制仍未明确,这凸显了开发新型生物标志物与治疗靶点的迫切需求。 本研究从基因表达综合数据库(Gene Expression Omnibus, GEO)中获取了5个公开数据集,分别为“GSE15197”“GSE53845”“GSE135065”“GSE185691”与“GSE195770”,用于筛选与IPF相关的基因表达变化。对数据进行注释与标准化处理,以最小化批次效应与技术变异。通过主成分分析(Principal Component Analysis, PCA)验证预处理效果;采用线性建模方法筛选差异表达基因(Differentially Expressed Genes, DEGs);通过跨数据集整合分析筛选核心差异表达基因。 本研究分析得到的差异表达基因与细胞外基质重塑、免疫应答调控等关键生物学过程显著相关。对5个GEO数据集的整合分析最终筛选出CXCL14、MMP7与MDK作为核心差异表达基因,用于构建最终预测模型。本研究通过最小绝对收缩与选择算子回归(Least Absolute Shrinkage and Selection Operator, LASSO)与随机森林算法,构建了逻辑回归预测模型,该模型具有良好的预测性能:训练队列的曲线下面积(Area Under the Curve, AUC)为0.92,验证队列的AUC为0.89,灵敏度为88%,特异度为85%。采用Shapley加性解释(Shapley Additive Explanations, SHAP)方法分析发现,CXCL14(平均SHAP值=0.38)是影响力最高的特征变量,其次为MMP7与MDK。功能富集分析显示,模型相关基因显著富集于转化生长因子-β(TGF-β)信号通路、细胞外基质重塑以及趋化因子信号通路。免疫浸润分析结果显示,CXCL14的表达水平与肺泡巨噬细胞、活化成纤维细胞群体呈正相关;SHAP交互分析揭示了CXCL14与TGF-β1在驱动肺纤维化过程中的协同效应。 本研究结果证实,IPF的发病机制与细胞外基质重塑及免疫失调密切相关。这提示后续研究应深入探索本次筛选得到的生物标志物在IPF早期诊断与临床管理中的实际应用价值。此外,本研究构建的基于机器学习的预测模型具有良好的临床应用潜力,需在前瞻性试验中进一步验证其在真实世界场景中的应用价值与治疗指导意义。
创建时间:
2025-08-06
二维码
社区交流群
二维码
科研交流群
商业服务