Table 1_Associations between metabolic-inflammatory biomarkers and Helicobacter pylori infection: an interpretable machine learning prediction approach.docx

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Table_1_Associations_between_metabolic-inflammatory_biomarkers_and_Helicobacter_pylori_infection_an_interpretable_machine_learning_prediction_approach_docx/30656969

下载链接

链接失效反馈

官方服务：

资源简介：

BackgroundThis study investigated the association between metabolic-inflammatory markers and Helicobacter pylori (HP) infection using interpretable machine learning models, with a focus on the triglyceride-glucose (TyG) index, TyG/HDL-C ratio, and systemic inflammatory biomarkers. MethodsData from 2,924 NHANES participants and 1,021 patients from the Second Hospital of Jilin University were analyzed. Associations between metabolic-inflammatory markers and HP were assessed using multivariable regression. Eleven machine learning models were compared for predictive performance, evaluated by AUC, accuracy, sensitivity, specificity, precision, F1 score, and Kappa statistic. Interpretability was assessed via SHAP values, calibration plots, confusion matrices, and decision curve analysis. ResultsIn NHANES, the TyG index was independently associated with HP infection (OR = 1.25, 95% CI 1.06–1.48, P = 0.009), and the TyG/HDL-C ratio remained significant after full adjustment (OR = 1.16, 95% CI 1.07–1.25, P < 0.001), while SIRI, IBI, and CRP lost significance. In the external Chinese cohort, the TyG association attenuated (P = 0.057), but higher TyG/HDL-C quartiles remained significant. Among 11 algorithms, Random Forest (RF) and Gaussian Process (GP) achieved the highest AUCs on the training set (both 0.97) but dropped markedly on the validation set (both 0.75), indicating overfitting. In contrast, XGBoost (XGB) and MLP maintained more consistent AUCs between training (0.77) and validation (0.77), reflecting better generalization. DeLong’s test indicated that both RF and XGB significantly outperformed baseline models (P < 0.001), while XGB demonstrated more stable validation performance. Decision curve and SHAP analyses supported the clinical relevance of XGB, highlighting Race and Age as dominant contributors. ConclusionThe TyG index and TyG/HDL-C ratio were independently associated with HP infection. Among machine learning models, XGBoost demonstrated the most stable and generalizable performance (AUC 0.77 in both training and validation), whereas RF and GP (AUC 0.97 → 0.75) exhibited overfitting. These results suggest that XGB provides a more reliable framework for infection risk prediction, though the cross-sectional design precludes causal inference.

研究背景本研究采用可解释机器学习模型，探究代谢-炎症标志物与幽门螺杆菌（Helicobacter pylori，HP）感染之间的关联，重点关注甘油三酯-葡萄糖（TyG）指数、TyG/高密度脂蛋白胆固醇（HDL-C）比值以及全身炎症生物标志物。研究方法本研究共纳入2924名美国国家健康与营养检查调查（National Health and Nutrition Examination Survey，NHANES）参与者以及吉林大学第二医院的1021名患者的数据进行分析。采用多变量回归分析评估代谢-炎症标志物与HP感染之间的关联。对比了11种机器学习模型的预测性能，采用受试者工作特征曲线下面积（AUC）、准确率、灵敏度、特异度、精确率、F1值以及Kappa统计量对模型性能进行评价。通过SHAP值、校准曲线、混淆矩阵以及决策曲线分析对模型可解释性进行评估。研究结果在NHANES队列中，甘油三酯-葡萄糖指数与HP感染呈独立相关（比值比OR=1.25，95%置信区间CI：1.06~1.48，P=0.009）；经全变量校正后，TyG/HDL-C比值仍具有统计学意义（OR=1.16，95%CI：1.07~1.25，P<0.001），而系统炎症反应指数（SIRI）、免疫炎症指数（IBI）以及C反应蛋白（CRP）则失去统计学显著性。在外部中国队列中，TyG指数与HP感染的关联有所减弱（P=0.057），但较高的TyG/HDL-C四分位数仍具有统计学意义。在11种算法中，随机森林（Random Forest，RF）与高斯过程（Gaussian Process，GP）在训练集上取得了最高的AUC值（均为0.97），但在验证集上的AUC值显著下降（均为0.75），提示存在过拟合现象。与之相反，极端梯度提升树（XGBoost，XGB）与多层感知机（MLP）在训练集（0.77）与验证集（0.77）上的AUC值更为一致，体现出更优的泛化能力。DeLong检验结果显示，RF与XGB的模型性能均显著优于基线模型（P<0.001），且XGB的验证集性能更为稳定。决策曲线分析与SHAP值分析证实了XGB模型的临床应用价值，其中种族与年龄是影响模型预测结果的主要因素。研究结论甘油三酯-葡萄糖指数与TyG/HDL-C比值均与HP感染呈独立相关。在所有机器学习模型中，XGBoost展现出最稳定且泛化能力最优的性能（训练集与验证集的AUC值均为0.77），而RF与GP则出现了明显的过拟合现象（AUC值从0.97降至0.75）。本研究结果表明，XGBoost可为感染风险预测提供更为可靠的分析框架，但由于本研究为横断面设计，无法进行因果推断。

创建时间：

2025-11-19