five

Supplementary Material for: Predicting CKD in Type 2 Diabetes Using Natural Language Processing on Healthcare Data

收藏
Figshare2025-10-27 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Supplementary_Material_for_Predicting_CKD_in_Type_2_Diabetes_Using_Natural_Language_Processing_on_Healthcare_Data/30451754
下载链接
链接失效反馈
官方服务:
资源简介:
Background: Persons with type 2 diabetes mellitus (T2DM) attending hospitals frequently experience major complications. We assessed the potential use of unstructured free-text data extracted from electronic health records (EHRs) using natural language processing (NLP) and machine learning (ML) to develop a predictive model for chronic kidney disease (CKD) in T2DM. Methods: This multicenter retrospective study included data from eight Spanish hospitals (2013–2018), extracted using NLP and ML techniques (EHRead®) based on SNOMED CT terminology. From a cohort of individuals with T2DM, we identified those with and without CKD at inclusion. Among individuals without CKD, we trained and validated a two-year predictive model for CKD development. The model showing the best balance between performance and clinical interpretability was selected for integration into a web-based tool to support early detection and risk stratification. Results: Of 588,786 individuals with T2DM, 316,597 were included for model development [training: 291,429 (92.1%); validation: 25,168 (7.9%); CKD incidence: 15.4% and 18.4%, respectively]. A high proportion of missing data was observed in key clinical variables. Among models evaluated, logistic regression (LR) achieved the best performance (AUC-ROC 0.72) using 27 predictors. Both a reduced 10-predictor model and a clinically refined 8-predictor model showed comparable performance to the full model in training and validation cohorts. The clinically refined model was selected for implementation in the web-based tool. Conclusions: Unstructured EHR data enabled the development of a predictive model for two-year CKD risk in persons with T2DM. Improving EHR data completeness remains essential to enhance future predictive modeling.
创建时间:
2025-10-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作