The prediction performance from different models.
收藏Figshare2025-11-13 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/The_prediction_performance_from_different_models_/30612258
下载链接
链接失效反馈官方服务:
资源简介:
BackgroundRisk of coronary heart disease (CHD) in a specific period of years can be assessed using scores calculated by models, such as pooled cohort equations (PCEs) and Framingham Risk Score. However, there are few studies on on-site estimation of CHD risk quantitatively with score calculation as auxiliary diagnosis. Nowadays, researchers introduce new technologies, such as machine learning, as effective CHD risk prediction models, but these models still need to be validated using real clinical data before promoting their use in real clinical settings.ObjectiveThe aim of this study is to predict CHD risk for high-risk population only using clinical data consisting of vital traits, lab measurement, diagnosis, medical device testing and medications. The prediction model can serve as an on-site quantitative indicator for the CHD risk of potential patients before diagnosis using coronary arteriography.MethodsThis work is designed as a retrospective study of a hospital-based cohort (The Second Affiliated Hospital of Guangxi Medical University), comprising 20,821 patients with CHD and 9,796 controls from 2017 to 2024. A two-layer machine learning model (TLML) is developed on the prediction results of the random forest and the gradient boosting decision tree to combine the merits of both models. The models were trained and validated with the clinical data in the cohort.ResultsThe TLML presented in this study can have a good accuracy (0.79, 95% CI 0.79–0.80), sensitivity (0.79, 95% CI 0.79–0.80) and specificity (0.79, 95% CI 0.79–0.79) for on-site CHD prediction. Compared with the PCEs (accuracy = 0.59, sensitivity = 0.58 and specificity = 0.60), the TLML shows remarkably better on-site CHD prediction performance. Predictor importance analysis results show that age, diabetes, antihypertensive medications, total bilirubin, hypertension, obstructive sleep apnea-hypopnea syndrome, red cell count, hemoglobin, cystatin C, retinol-binding protein, gender and low-density lipoprotein cholesterol level are the most important variables for on-site CHD prediction. All the features mentioned were reported to have relationship with CHD on some levels in previous studies. A reduced complexity model is also presented to provide decent CHD prediction with only 20 predictors to increase model practicality, achieving a prediction accuracy of 0.73.ConclusionsThe machine learning models presented in this study have the potential to become auxiliary on-site diagnostics tool of CHD because of its capability for accurate prediction and easy availability of all the required prediction variables.
创建时间:
2025-11-13



