Machine Learning Identifies Plasma Proteomic Signatures of Descending Thoracic Aortic Disease
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://www.omicsdi.org/dataset/pride/PXD041337
下载链接
链接失效反馈官方服务:
资源简介:
Descending thoracic aortic aneurysms and dissections can go undetected until severe and catastrophic, and few clinical indices exist to screen for aneurysms or predict their risk of dissection or rupture. This study generated a plasma proteomic dataset from 150 patients with descending thoracic aortic disease and 52 controls to identify proteomic signatures capable of differentiating descending thoracic aortic disease from non-disease controls, as well as between cases with aneurysm versus descending ‘type B’ dissection. Of the 1,468 peptides and 195 proteins quantified across all samples, 853 peptides and 99 proteins were quantitatively different between disease and control patients (BH adjusted p-value < 0.01 from t-tests). Supervised machine learning (ML) methods were used to classify disease from control and aneurysm from descending dissection cases. The highest precision-recall area under the curve (PR AUC) was achieved on the held-out test set using significantly different proteins between disease and control patients (PR AUC 0.99), followed by input of significant peptides (PR AUC 0.96). Despite no statistically significant proteins between aneurysm and dissection cases, use of all proteins was able to modestly classify between the two disease states (PR AUC 0.77). To overcome correlation in the proteins and enable biological pathway analysis, a disease versus control classifier was optimized using only seven unique protein clusters, which achieved comparable performance to models trained on all/significant proteins (accuracy 0.88, F1-score 0.78, PR AUC 0.90). Model interpretation with permutation importance revealed that proteins in the most important clusters for differentiating disease and control function in coagulation, protein-lipid complex remodeling, and acute inflammatory response.
胸降主动脉瘤与主动脉夹层往往在进展至严重灾难性状态前难以被检出,且目前可供筛查动脉瘤或预测其夹层、破裂风险的临床指标极为有限。本研究纳入150例胸降主动脉疾病患者与52例健康对照,构建血浆蛋白质组数据集,旨在筛选可区分胸降主动脉疾病与非疾病对照、以及区分动脉瘤与胸降主动脉B型夹层的蛋白质组特征。在所有样本中共定量到1468条肽段(peptide)与195种蛋白质,其中疾病患者与对照患者间存在定量差异的肽段共853条、蛋白质共99个(基于t检验得到的Benjamini-Hochberg(BH)校正P值<0.01)。本研究采用监督式机器学习(ML)方法,分别构建疾病-对照分类模型与动脉瘤-胸降主动脉夹层分类模型。在预留测试集上,利用疾病与对照间存在显著差异的蛋白质构建的分类模型取得了最高的精确率-召回率曲线下面积(PR AUC)值(PR AUC=0.99),其次为输入显著差异肽段的模型(PR AUC=0.96)。尽管动脉瘤与夹层病例间未检出具有统计学意义的差异蛋白质,但利用全部蛋白质仍可对两类疾病状态实现中等精度的分类(PR AUC=0.77)。为消除蛋白质间的相关性干扰并开展生物学通路分析,本研究仅采用7个独特蛋白质簇(protein cluster)优化了疾病-对照分类模型,其性能与基于全部/显著差异蛋白质训练的模型相当(准确率=0.88,F1分数=0.78,PR AUC=0.90)。通过置换重要性(permutation importance)进行模型解释后发现,在区分疾病与对照的关键蛋白质簇中,相关蛋白质主要参与凝血过程、蛋白质-脂质复合物重塑以及急性炎症反应。
创建时间:
2024-06-16



