Data_Sheet_2_Machine Learning Prediction Models for Mechanically Ventilated Patients: Analyses of the MIMIC-III Database.PDF
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_2_Machine_Learning_Prediction_Models_for_Mechanically_Ventilated_Patients_Analyses_of_the_MIMIC-III_Database_PDF/14890818
下载链接
链接失效反馈官方服务:
资源简介:
Background: Mechanically ventilated patients in the intensive care unit (ICU) have high mortality rates. There are multiple prediction scores, such as the Simplified Acute Physiology Score II (SAPS II), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA), widely used in the general ICU population. We aimed to establish prediction scores on mechanically ventilated patients with the combination of these disease severity scores and other features available on the first day of admission.
Methods: A retrospective administrative database study from the Medical Information Mart for Intensive Care (MIMIC-III) database was conducted. The exposures of interest consisted of the demographics, pre-ICU comorbidity, ICU diagnosis, disease severity scores, vital signs, and laboratory test results on the first day of ICU admission. Hospital mortality was used as the outcome. We used the machine learning methods of k-nearest neighbors (KNN), logistic regression, bagging, decision tree, random forest, Extreme Gradient Boosting (XGBoost), and neural network for model establishment. A sample of 70% of the cohort was used for the training set; the remaining 30% was applied for testing. Areas under the receiver operating characteristic curves (AUCs) and calibration plots would be constructed for the evaluation and comparison of the models' performance. The significance of the risk factors was identified through models and the top factors were reported.
Results: A total of 28,530 subjects were enrolled through the screening of the MIMIC-III database. After data preprocessing, 25,659 adult patients with 66 predictors were included in the model analyses. With the training set, the models of KNN, logistic regression, decision tree, random forest, neural network, bagging, and XGBoost were established and the testing set obtained AUCs of 0.806, 0.818, 0.743, 0.819, 0.780, 0.803, and 0.821, respectively. The calibration curves of all the models, except for the neural network, performed well. The XGBoost model performed best among the seven models. The top five predictors were age, respiratory dysfunction, SAPS II score, maximum hemoglobin, and minimum lactate.
Conclusion: The current study indicates that models with the risk of factors on the first day could be successfully established for predicting mortality in ventilated patients. The XGBoost model performs best among the seven machine learning models.
背景:重症监护病房(intensive care unit, ICU)内接受有创机械通气的患者病死率居高不下。目前临床已有多种预测评分模型,如简化急性生理学评分II(Simplified Acute Physiology Score II, SAPS II)、牛津急性疾病严重程度评分(Oxford Acute Severity of Illness Score, OASIS)以及序贯器官衰竭评估(Sequential Organ Failure Assessment, SOFA),在普通ICU人群中应用广泛。本研究旨在结合此类疾病严重程度评分与入院首日可获取的其他临床特征,为有创机械通气患者构建专属预测评分模型。
方法:本研究基于重症监护医学信息数据库(Medical Information Mart for Intensive Care, MIMIC-III)开展回顾性行政数据库研究。本研究关注的特征变量包括ICU入院首日的人口学特征、ICU入院前合并症、ICU诊断结果、疾病严重程度评分、生命体征及实验室检测指标。以住院病死率作为模型的结局指标。本研究采用k近邻(k-nearest neighbors, KNN)、逻辑回归、装袋法(bagging)、决策树、随机森林、极限梯度提升(Extreme Gradient Boosting, XGBoost)及神经网络等机器学习方法构建预测模型。将70%的队列样本划分为训练集,剩余30%作为测试集。通过绘制受试者工作特征曲线下面积(areas under the receiver operating characteristic curves, AUCs)与校准曲线,对各模型的预测性能进行评估与对比。通过模型识别危险因素的统计学显著性,并报告排名靠前的关键影响因素。
结果:通过对MIMIC-III数据库进行筛选,共纳入28530名受试者。经数据预处理后,最终纳入25659名成年患者,共包含66个预测变量用于模型分析。基于训练集分别构建KNN、逻辑回归、决策树、随机森林、神经网络、装袋法及XGBoost模型,测试集对应的AUC分别为0.806、0.818、0.743、0.819、0.780、0.803及0.821。除神经网络模型外,其余所有模型的校准曲线表现均较为良好。在7种模型中,XGBoost模型的预测性能最优。排名前五的预测变量依次为年龄、呼吸功能障碍、SAPS II评分、最高血红蛋白值及最低乳酸值。
结论:本研究证实,可基于入院首日的危险因素相关特征成功构建机械通气患者的住院病死率预测模型。在7种机器学习模型中,XGBoost模型的预测性能最佳。
创建时间:
2021-07-01



