Supplementary Material for: Implementation of machine learning algorithms to screen for advanced liver fibrosis in metabolic dysfunction-associated steatotic liver disease (MASLD): an in-depth explanatory analysis
收藏DataCite Commons2024-10-25 更新2025-01-06 收录
下载链接:
https://karger.figshare.com/articles/dataset/Supplementary_Material_for_Implementation_of_machine_learning_algorithms_to_screen_for_advanced_liver_fibrosis_in_metabolic_dysfunction-associated_steatotic_liver_disease_MASLD_an_in-depth_explanatory_analysis/27302328/1
下载链接
链接失效反馈官方服务:
资源简介:
Background
This study aimed to train machine learning algorithms(MLAs) to detect advanced fibrosis(AF) in MASLD patients at the level of primary care setting and to explain the predictions to ensure responsible use by clinicians.
Methods
Readily available features of 618 MASLD patients followed up at a tertiary center were used to train five MLAs. AF was defined as liver stiffness≥9.3 kPa, measured via 2-dimension shear wave elastography(n=495) or liver biopsy≥F3(n=123). MLAs were compared to Fibrosis-4 index(FIB-4) and NAFLD fibrosis score(NFS) on 540 MASLD patients from the primary care setting as validation. Feature importance, partial dependence, and shapely additive explanations(SHAP) were utilized for explanation.
Results
Extreme gradient boosting(XGBoost) achieved an AUC=0.91,outperforming FIB-4(AUC=0.78) and NFS(AUC=0.81, both p<0.05) with specificity=76% vs. 59% and 48% for FIB-4≥1.3 and NFS≥-1.45, respectively(p<0.05). Its sensitivity(91%) was superior to FIB-4(79%). XGBoost confidently excluded AF (negative predictive value=99%) with the highest positive predictive value (31%), superior to FIB-4 and NFS (all p<0.05). The most important features were HbA1c and GGT with a steep increase in AF probability at HbA1c>6.5%. The strongest interaction was between AST and age. XGBoost, but not logistic regression, extracted informative patterns from ALT, LDL-c,and ALP(p<0.001). One quarter of the false positives (FP) were correctly reclassified with only one additional false negative based on the SHAP values of GGT, platelets, and ALT which were found to be associated with a FP classification.
Conclusions:
An explainable XGBoost algorithm was demonstrated superior to FIB-4 and NFS for screening of AF in MASLD patients at the primary care setting. The algorithm also proved safe for use as clinicians can understand the predictions and flag FP classifications.
研究背景
本研究旨在训练机器学习算法(machine learning algorithms, MLAs),以在基层医疗环境中检测代谢相关脂肪性肝病(Metabolic Dysfunction-Associated Steatotic Liver Disease, MASLD)患者的进展期纤维化(advanced fibrosis, AF),并对预测结果进行解释,确保临床医生能够合理应用该模型。
研究方法
本研究纳入在三级医疗中心接受随访的618例MASLD患者的常规可获取临床特征,用于训练5种机器学习算法。进展期纤维化(AF)定义为:经二维剪切波弹性成像(n=495)测得肝脏硬度≥9.3 kPa,或肝活检纤维化分期≥F3(n=123)。以来自基层医疗环境的540例MASLD患者作为验证队列,将机器学习算法与纤维化-4指数(Fibrosis-4 index, FIB-4)、非酒精性脂肪性肝病纤维化评分(NAFLD fibrosis score, NFS)进行对比。本研究采用特征重要性、偏依赖分析以及沙普利加性解释(shapely additive explanations, SHAP)对模型预测结果进行解释。
研究结果
极端梯度提升(Extreme gradient boosting, XGBoost)模型的曲线下面积(Area Under the Curve, AUC)达到0.91,优于FIB-4(AUC=0.78)与NFS(AUC=0.81,两者p均<0.05);其特异性为76%,而当FIB-4≥1.3、NFS≥-1.45时,对应模型的特异性分别为59%与48%(p均<0.05)。该模型的灵敏度(91%)亦优于FIB-4(79%)。XGBoost可可靠排除AF(阴性预测值为99%),且阳性预测值(31%)为三者中最高,显著优于FIB-4与NFS(p均<0.05)。对模型贡献度最高的特征为糖化血红蛋白(HbA1c)与γ-谷氨酰转移酶(GGT),当HbA1c>6.5%时,AF发生概率出现显著升高。模型中最强的交互作用存在于天冬氨酸氨基转移酶(AST)与年龄之间。相较于逻辑回归模型,XGBoost可从丙氨酸氨基转移酶(ALT)、低密度脂蛋白胆固醇(LDL-c)以及碱性磷酸酶(ALP)中提取有效信息模式(p<0.001)。基于与假阳性(FP)分类相关的GGT、血小板及ALT的SHAP值,四分之一的假阳性样本可被正确重新分类,且仅新增1例假阴性样本。
研究结论:
本研究证实,可解释的XGBoost算法在基层医疗环境中用于MASLD患者AF筛查时,性能优于FIB-4与NFS。由于临床医生可理解该模型的预测结果并标记假阳性(FP)分类,因此该算法具备良好的临床应用安全性。
提供机构:
Karger Publishers
创建时间:
2024-10-25



