Machine learning algorithms for diabetic kidney disease risk predictive model of Chinese patients with type 2 diabetes mellitus
收藏Taylor & Francis Group2025-05-14 更新2026-04-16 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Machine_learning_algorithms_for_diabetic_kidney_disease_risk_predictive_model_of_Chinese_patients_with_type_2_diabetes_mellitus/28748457/1
下载链接
链接失效反馈官方服务:
资源简介:
Diabetic kidney disease (DKD) is a common and serious complication of diabetic mellitus (DM). More sensitive methods for early DKD prediction are urgently needed. This study aimed to set up DKD risk prediction models based on machine learning algorithms (MLAs) in patients with type 2 DM (T2DM). The electronic health records of 12,190 T2DM patients with 3-year follow-ups were extracted, and the dataset was divided into a training and testing dataset in a 4:1 ratio. The risk variables for DKD development were ranked and selected to establish forecasting models. The performance of models was further evaluated by the indexes of sensitivity, specificity, positive predictive value, negative predictive value, accuracy, as well as F1 score, using the testing dataset. The value of accuracy was used to select the optimal model. Using the importance ranking in the random forest package, the variables of age, urinary albumin-to-creatinine ratio, serum cystatin C, estimated glomerular filtration rate, and neutrophil percentage were selected as the predictors for DKD onset. Among the seven forecasting models constructed by MLAs, the accuracy of the Light Gradient Boosting Machine (LightGBM) model was the highest, indicated that the LightGBM algorithms might perform the best for predicting 3-year risk of DKD onset. Our study could provide powerful tools for early DKD risk prediction, which might help optimize intervention strategies and improve the renal prognosis in T2DM patients. DKD occurs in 20-40% of patients with DM and has become the leading cause of chronic kidney disease (CKD). Early detection and intervention of DKD are strongly recommended to improve the renal and cardiovascular outcomes for DM patients. DKD is clinically diagnosed by persistent albuminuria and reduced eGFR, however, about 25% of patients have sustained kidney injury without albuminuria, and serum creatinine starts increasing after more than 50% of nephrons get injured, both albuminuria and reduced eGFR should be considered as late indexes for DKD screening. Constructing models for DKD prediction could provide a noninvasive and accurate tool for its early diagnosis and intervention, thereby delaying DKD development in DM patients. Explore novel methods of establishing and verifying models for DKD prediction using machine learning algorithms (MLAs) and traditional clinical data from electronic health records (EHRs). The age, urinary albumin-to-creatinine ratio (UACR), serum cystatin C, eGFR, and neutrophil percentage could be predictors for DKD onset in Chinese T2DM patients. Seven forecasting models were constructed using the MLAs of Logistic Regression, Decision Tree, Support Vector Machines, Random Forests, Naive Bayes, eXtreme Gradient Boosting (XGBoost), as well as Light Gradient Boosting Machine (LightGBM). Of which, the accuracy of LightGBM ranked the highest. Construct and validate an ML framework to predict long-term DKD onset in T2DM patients, which could automatically identify risk factors associated with DKD and provide personalized DKD monitoring tools for T2DM patients. This framework combined MLAs and clinical big data from EHRs, which could be a screening tool for patient selection for related clinical trials. Our results could provide useful tools for predicting DKD development, and enable early screening and intervention, thereby helping to improve renal outcomes in T2DM patients.
提供机构:
Hou, Zhi-Li; Wang, Xue; Lu, Jiang-Tao; Sun, Ling; Zou, Lu-Xi
创建时间:
2025-04-08



