Table 2_Explainable machine learning for early detection of Escherichia coli urinary tract infections: integrating SHAP interpretation and bacterial epidemiology.docx
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Table_2_Explainable_machine_learning_for_early_detection_of_Escherichia_coli_urinary_tract_infections_integrating_SHAP_interpretation_and_bacterial_epidemiology_docx/31331611
下载链接
链接失效反馈官方服务:
资源简介:
BackgroundEscherichia coli is the predominant uropathogen in urinary tract infections (UTIs), but culture-based identification is time-consuming. This study aimed to develop an explainable, culture-independent model to distinguish E. coli from other uropathogens using routinely collected clinical data.
MethodsWe retrospectively analyzed 308 hospitalized patients with culture-confirmed UTIs at Fuding Hospital, Fujian University of Traditional Chinese Medicine (January–December 2023), classified as E. coli (n = 158) or non–E. coli (n = 150). Species identification was performed using an automated microbiology system. Nineteen predictors (sex, urinary leukocyte grade, and 17 routine laboratory variables) were used. Associations with E. coli UTI were examined using univariate and multivariable logistic regression. A Random Forest (RF) classifier was developed with SHapley Additive exPlanations (SHAP) for interpretability. Data were split using a stratified 70/30 train–test split; 5-fold stratified cross-validation within the training set was used for hyperparameter tuning, and final performance (discrimination and calibration) was reported on the held-out test set. RF was additionally benchmarked against regularized logistic regression, calibrated linear SVM, and gradient boosting using the same protocol.
ResultsE. coli accounted for 51.3% of isolates, followed by Enterococcus spp. (18.5%) and Klebsiella spp. (7.8%). Compared with non–E. coli cases, E. coli infections were more common in females and showed higher lymphocyte counts (LYM), alanine aminotransferase (ALT), and albumin (ALB) (all P < 0.05). Multivariable logistic regression identified sex, LYM, and urinary leukocyte grade as independent predictors. On the held-out test set, RF achieved moderate discrimination (ROC-AUC = 0.66; average precision = 0.66) with calibration assessed by Brier score and calibration slope. SHAP highlighted Sex, LYM, and ALT as the most influential predictors and revealed patient-level heterogeneity in feature effects.
ConclusionsE. coli remains the predominant pathogen among hospitalized UTIs. An explainable RF model using routine laboratory variables provided moderate, reproducible discrimination of E. coli vs non–E. coli UTIs and may support earlier decision-making while awaiting culture results.
创建时间:
2026-02-13



