Predicting Ventilator-Associated Pneumonia in ICU Patients with Type 2 Diabetes — Data Preprocessing, Baseline Features, Correlation Analysis, Model Evaluation, the Web-based Calculator, and the TRIPOD-AI Guideline
收藏DataCite Commons2025-12-16 更新2026-02-09 收录
下载链接:
https://figshare.com/articles/dataset/Data_Preprocessing_Baseline_Characteristics_Variable_Correlation_Analysis_and_Model_Evaluation_for_Predicting_Ventilator-Associated_Pneumonia_in_ICU_Patients_with_Type_2_Diabetes_Mellitus/30454706/9
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the baseline characteristics and supplementary data from a study of ICU patients with type 2 diabetes mellitus (T2DM), aiming to predict ventilator-associated pneumonia (VAP) using machine learning.The baseline characteristics table summarizes patient demographics, vital signs, and laboratory measurements. Supplementary figures illustrate the data preprocessing steps (histograms and boxplots before and after interquartile range cleaning), missing value imputation using the Random Forest method, variable correlation analysis (Spearman correlation heatmap), model evaluation (confusion matrices of four predictive models), and the Web-based Calculator. In addition, the dataset includes a file summarizing the TRIPOD-AI guideline used for model reporting. These data provide a detailed overview of feature selection, data cleaning procedures, and model performance assessment.<b>Fig. S1</b>. Histograms and boxplots of Glucose_max and SBP_max in original and cleaned datasets: <b><i>Glusco_max</i></b>, maximum blood glucose; <b><i>SBP_max</i></b>, maximum systolic blood pressure. (A) original Glusco_max; (B) cleaned Glusco_max; (C) original SBP_max; (D) cleaned SBP_max.<b>Fig. S2</b>. Histograms and boxplots of Temp_min and WBC_min in original and cleaned datasets: <b><i>Temp_min</i></b>, minimum body temperature; <b><i>WBC_min</i></b>, minimum white blood cell count.(A)original Temp_min; (B)cleaned Temp_min; (C)original WBC_min; (D)cleaned WBC_min.<b>Fig. S3</b>. Histograms of PH_max and PH_min in original and Random Forest–imputed datasets: <b><i>PH_max</i></b>, maximum pH; <b><i>PH_min</i></b>, minimum pH.<b>Fig. S4</b>. Histograms of PO2_max and PO2_min in original and Random Forest–imputed datasets:<i> </i><b><i>PO</i></b><sub><i><strong>2</strong></i></sub><b><i>_max</i></b>, maximum partial pressure of oxygen; <b><i>PO</i></b><sub><i><strong>2</strong></i></sub><b><i>_min</i></b>, minimum partial pressure of oxygen.<b>Fig. S5</b>. Histograms of PT_max and PT_min in original and Random Forest–imputed datasets: <b><i>PT_max</i></b>, maximum prothrombin time;<i> </i><b><i>PT_min</i></b>, minimum prothrombin time.<b>Fig</b><b>. S6</b>. Spearman correlation heatmap of variables selected by both the Boruta algorithm and LASSO regression:<b>Hypertension</b>, history of hypertension; <b><i>Temp_min</i></b>, minimum body temperature;<i> </i><b><i>Glusco_max</i></b>, maximum blood glucose; <b><i>Scr_max</i></b>, maximum serum creatinine; <b><i>WBC_min</i></b>, minimum white blood cell count;<b><i>CNS</i></b>, SOFA neurological subscore; <b><i>Renal</i></b>, SOFA renal subscore; and <b><i>GCS</i></b>, Glasgow Coma Scale.<b>Fig</b><b>. S7</b>. Confusion matrices of four predictive models: (A) Logistic Regression, (B) Random Forest, (C) XGBoost, and (D) Gradient Boosting Machine (<b><i>GBM</i></b>). Each matrix presents the counts of true positives, true negatives, false positives, and false negatives, facilitating model performance comparison.<b>Fig. S8. Screenshot of the web-based calculator for VAP risk prediction in ICU patients with type 2 diabetes. The calculator allows clinicians to input key patient features and obtain real-time risk predictions based on the trained machine learning model.</b>
提供机构:
figshare
创建时间:
2025-11-20



