Supplementary Material for: Predicting CKD in Type 2 Diabetes Using Natural Language Processing on Healthcare Data
收藏DataCite Commons2025-10-27 更新2026-02-09 收录
下载链接:
https://karger.figshare.com/articles/dataset/Supplementary_Material_for_Predicting_CKD_in_Type_2_Diabetes_Using_Natural_Language_Processing_on_Healthcare_Data/30451754
下载链接
链接失效反馈官方服务:
资源简介:
Background: Persons with type 2 diabetes mellitus (T2DM) attending hospitals frequently experience major complications. We assessed the potential use of unstructured free-text data extracted from electronic health records (EHRs) using natural language processing (NLP) and machine learning (ML) to develop a predictive model for chronic kidney disease (CKD) in T2DM.
Methods: This multicenter retrospective study included data from eight Spanish hospitals (2013–2018), extracted using NLP and ML techniques (EHRead®) based on SNOMED CT terminology. From a cohort of individuals with T2DM, we identified those with and without CKD at inclusion. Among individuals without CKD, we trained and validated a two-year predictive model for CKD development. The model showing the best balance between performance and clinical interpretability was selected for integration into a web-based tool to support early detection and risk stratification.
Results: Of 588,786 individuals with T2DM, 316,597 were included for model development [training: 291,429 (92.1%); validation: 25,168 (7.9%); CKD incidence: 15.4% and 18.4%, respectively]. A high proportion of missing data was observed in key clinical variables. Among models evaluated, logistic regression (LR) achieved the best performance (AUC-ROC 0.72) using 27 predictors. Both a reduced 10-predictor model and a clinically refined 8-predictor model showed comparable performance to the full model in training and validation cohorts. The clinically refined model was selected for implementation in the web-based tool.
Conclusions: Unstructured EHR data enabled the development of a predictive model for two-year CKD risk in persons with T2DM. Improving EHR data completeness remains essential to enhance future predictive modeling.
提供机构:
Karger Publishers
创建时间:
2025-10-27



