Data_Sheet_1_Explainable Machine Learning to Predict Successful Weaning Among Patients Requiring Prolonged Mechanical Ventilation: A Retrospective Cohort Study in Central Taiwan.docx

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://figshare.com/articles/dataset/Data_Sheet_1_Explainable_Machine_Learning_to_Predict_Successful_Weaning_Among_Patients_Requiring_Prolonged_Mechanical_Ventilation_A_Retrospective_Cohort_Study_in_Central_Taiwan_docx/14471892

下载链接

链接失效反馈

官方服务：

资源简介：

Objective: The number of patients requiring prolonged mechanical ventilation (PMV) is increasing worldwide, but the weaning outcome prediction model in these patients is still lacking. We hence aimed to develop an explainable machine learning (ML) model to predict successful weaning in patients requiring PMV using a real-world dataset. Methods: This retrospective study used the electronic medical records of patients admitted to a 12-bed respiratory care center in central Taiwan between 2013 and 2018. We used three ML models, namely, extreme gradient boosting (XGBoost), random forest (RF), and logistic regression (LR), to establish the prediction model. We further illustrated the feature importance categorized by clinical domains and provided visualized interpretation by using SHapley Additive exPlanations (SHAP) as well as local interpretable model-agnostic explanations (LIME). Results: The dataset contained data of 963 patients requiring PMV, and 56.0% (539/963) of them were successfully weaned from mechanical ventilation. The XGBoost model (area under the curve [AUC]: 0.908; 95% confidence interval [CI] 0.864–0.943) and RF model (AUC: 0.888; 95% CI 0.844–0.934) outperformed the LR model (AUC: 0.762; 95% CI 0.687–0.830) in predicting successful weaning in patients requiring PMV. To give the physician an intuitive understanding of the model, we stratified the feature importance by clinical domains. The cumulative feature importance in the ventilation domain, fluid domain, physiology domain, and laboratory data domain was 0.310, 0.201, 0.265, and 0.182, respectively. We further used the SHAP plot and partial dependence plot to illustrate associations between features and the weaning outcome at the feature level. Moreover, we used LIME plots to illustrate the prediction model at the individual level. Additionally, we addressed the weekly performance of the three ML models and found that the accuracy of XGBoost/RF was ~0.7 between weeks 4 and week 7 and slightly declined to 0.6 on weeks 8 and 9. Conclusion: We used an ML approach, mainly XGBoost, SHAP plot, and LIME plot to establish an explainable weaning prediction ML model in patients requiring PMV. We believe these approaches should largely mitigate the concern of the black-box issue of artificial intelligence, and future studies are warranted for the landing of the proposed model.

研究背景与目的：全球范围内需要长期机械通气（prolonged mechanical ventilation, PMV）的患者数量持续攀升，但针对此类患者的脱机结局预测模型仍较为匮乏。本研究旨在基于真实世界数据集，开发一款可解释机器学习（explainable machine learning, ML）模型，用于预测需长期机械通气患者的成功脱机结局。研究方法：本回顾性研究纳入了2013年至2018年间，中国台湾中部某拥有12张床位的呼吸照护中心收治的患者电子病历数据。本研究选取极端梯度提升（extreme gradient boosting, XGBoost）、随机森林（random forest, RF）以及逻辑回归（logistic regression, LR）三种机器学习模型构建预测模型。此外，我们依据临床领域对特征重要性进行分类阐释，并通过SHapley可加解释（SHapley Additive exPlanations, SHAP）以及局部可解释模型无关解释（local interpretable model-agnostic explanations, LIME）实现可视化模型解释。研究结果：本数据集共纳入963名需长期机械通气的患者，其中56.0%（539/963）的患者成功脱离机械通气。在预测需长期机械通气患者的成功脱机结局方面，极端梯度提升模型（曲线下面积（area under the curve, AUC）：0.908；95%置信区间（95% confidence interval, CI）：0.864~0.943）与随机森林模型（AUC：0.888；95% CI：0.844~0.934）的表现均优于逻辑回归模型（AUC：0.762；95% CI：0.687~0.830）。为便于临床医师直观理解模型，我们依据临床领域对特征重要性进行分层分析：通气领域、体液领域、生理领域以及实验室数据领域的累积特征重要性分别为0.310、0.201、0.265与0.182。我们进一步通过SHAP图与偏依赖图（partial dependence plot）在特征层面阐释特征与脱机结局之间的关联；此外，通过LIME图在个体层面展示预测模型的决策逻辑。本研究同时分析了三种机器学习模型的周度表现，发现极端梯度提升/随机森林模型在第4周至第7周的准确率约为0.7，在第8周与第9周时准确率略微下降至0.6。研究结论：本研究通过机器学习方法，主要包括极端梯度提升模型、SHAP图与LIME图，构建了一款可解释的需长期机械通气患者脱机结局预测模型。我们认为此类方法可在很大程度上缓解人工智能（artificial intelligence）“黑箱”问题带来的临床顾虑，未来需开展相关研究以推动本模型的临床落地应用。

创建时间：

2021-04-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集