Table_1_Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province.DOCX
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://figshare.com/articles/dataset/Table_1_Machine_learning-based_warning_model_for_chronic_kidney_disease_in_individuals_over_40_years_old_in_underprivileged_areas_Shanxi_Province_DOCX/21838575
下载链接
链接失效反馈官方服务:
资源简介:
IntroductionChronic kidney disease (CKD) is a progressive disease with high incidence but early imperceptible symptoms. Since China’s rural areas are subject to inadequate medical check-ups and single disease screening programme, it could easily translate into end-stage renal failure. This study aimed to construct an early warning model for CKD tailored to impoverished areas by employing machine learning (ML) algorithms with easily accessible parameters from ten rural areas in Shanxi Province, thereby, promoting a forward shift of treatment time and improving patients’ quality of life.
MethodsFrom April to November 2019, CKD opportunistic screening was carried out in 10 rural areas in Shanxi Province. First, general information, physical examination data, blood and urine specimens were collected from 13,550 subjects. Afterward, feature selection of explanatory variables was performed using LASSO regression, and target datasets were balanced using the SMOTE (synthetic minority over-sampling technique) algorithm, i.e., albuminuria-to-creatinine ratio (ACR) and α1-microglobulin-to-creatinine ratio (MCR). Next, Bagging, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) were employed for classification of ACR outcomes and MCR outcomes, respectively.
Results12,330 rural residents were included in this study, with 20 explanatory variables. The cases with increased ACR and increased MCR represented 1,587 (12.8%) and 1,456 (11.8%), respectively. After conducting LASSO, 14 and 15 explanatory variables remained in these two datasets, respectively. Bagging, RF, and XGBoost performed well in classification, with the AUC reaching 0.74, 0.87, 0.87, 0.89 for ACR outcomes and 0.75, 0.88, 0.89, 0.90 for MCR outcomes. The five variables contributing most to the classification of ACR outcomes and MCR outcomes constituted SBP, TG, TC, and Hcy, DBP and age, TG, SBP, Hcy and FPG, respectively. Overall, the machine learning algorithms could emerge as a warning model for CKD.
ConclusionML algorithms in conjunction with rural accessible indexes boast good performance in classification, which allows for an early warning model for CKD. This model could help achieve large-scale population screening for CKD in poverty-stricken areas and should be promoted to improve the quality of life and reduce the mortality rate.
引言
慢性肾脏病(Chronic Kidney Disease,CKD)是一种进展性疾病,发病率较高,但早期症状隐匿难察。由于我国农村地区存在医疗检查资源不足、疾病筛查方案单一的问题,该病极易进展为终末期肾衰竭。本研究旨在利用机器学习(Machine Learning,ML)算法,结合山西省10个农村地区易于获取的检测指标,构建适配贫困地区的CKD早期预警模型,从而推动诊疗时机前移,提升患者生活质量。
研究方法
2019年4月至11月,研究团队在山西省10个农村地区开展了CKD机会性筛查。首先,从13550名研究对象中收集一般信息、体格检查数据以及血液、尿液标本。随后,采用LASSO回归对解释变量进行特征筛选,并通过SMOTE(合成少数类过采样技术,Synthetic Minority Over-sampling Technique)算法平衡目标数据集,目标结局指标为尿白蛋白肌酐比(ACR,Albuminuria-to-Creatinine Ratio)与α1-微球蛋白肌酐比(MCR,α1-Microglobulin-to-Creatinine Ratio)。接下来,分别采用Bagging、随机森林(Random Forest,RF)与极限梯度提升(eXtreme Gradient Boosting,XGBoost)算法对ACR结局与MCR结局进行分类预测。
研究结果
本研究最终纳入12330名农村居民,共纳入20个解释变量。其中ACR升高与MCR升高的病例数分别为1587例(占比12.8%)与1456例(占比11.8%)。经LASSO回归筛选后,两个数据集分别保留14个与15个解释变量。三类算法在分类任务中均表现优异,针对ACR结局的受试者工作特征曲线下面积(Area Under the Receiver Operating Characteristic Curve,AUC)值分别为0.74、0.87、0.87、0.89,针对MCR结局的AUC值分别为0.75、0.88、0.89、0.90。对ACR结局分类贡献最大的5个变量依次为收缩压(Systolic Blood Pressure,SBP)、甘油三酯(Triglyceride,TG)、总胆固醇(Total Cholesterol,TC)、同型半胱氨酸(Homocysteine,Hcy)与舒张压(Diastolic Blood Pressure,DBP);对MCR结局分类贡献最大的5个变量依次为年龄、甘油三酯(TG)、收缩压(SBP)、同型半胱氨酸(Hcy)与空腹血糖(Fasting Plasma Glucose,FPG)。整体而言,机器学习算法可用于构建CKD早期预警模型。
结论
结合农村地区可便捷获取的检测指标,机器学习算法在分类任务中表现优异,可用于构建CKD早期预警模型。该模型有助于在贫困地区开展CKD大规模人群筛查,应予以推广以提升患者生活质量、降低病死率。
创建时间:
2023-01-09



