five

Data Sheet 1_Machine learning-based coronary heart disease diagnosis model for type 2 diabetes patients.docx

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_Machine_learning-based_coronary_heart_disease_diagnosis_model_for_type_2_diabetes_patients_docx/29124584
下载链接
链接失效反馈
官方服务:
资源简介:
BackgroundTo establish a classification model for assisting the diagnosis of type 2 diabetes mellitus (T2DM) complicated with coronary heart disease (CHD). MethodsPatients with T2DM who underwent coronary angiography (CA) were enrolled from seven affiliated hospitals of Chongqing Medical University. Statistical differences in clinical variables between T2DM with or without CHD patients were verified using univariate analysis. The original data was divided into a training set and a validation set in a 7:3 ratio. The training set data were used to screen features using Logistic regression, Lasso regression, or recursive feature elimination (RFE). Five machine learning algorithms, including Logistic regression, Support Vector Machine (SVM), Random Forest (RF), eXtreme gradient boosting (XgBoost), and Light Gradient Boosting Machine (LightGBM), were selected for modeling. The performance of the models was verified through 5-fold cross-validation and the training set. ResultsClinical data were collected from 1943 patients with T2DM complicated with CHD and 574 T2DM patients without CHD. Univariate analysis identified 20 optimal risk factors, four of the risk factors had over 30% missing values, we ultimately included 16 risk factors. Logistic regression screened eight features, Lasso regression screened ten features, the RFE method screened eight, fourteen, sixteen, and thirteen features for SVM, RF, XgBoost, and LightGBM, respectively. Among all models, the XgBoost model based on features selected by RFE+LightGBM demonstrated the best performance, achieving an AUC of 0.814 (95% CI, 0.779-0.847), accuracy of 0.799 (95% CI, 0.771-0.827), precision of 0.841 (95% CI, 0.812-0.868), recall of 0.920 (95% CI, 0.898-0.941), and F1-score of 0.879 (95% CI, 0.859-0.897) in the testing set. ConclusionsBased on T2DM data and machine learning theory, a Bayesian-optimized XgBoost model was established using the RFE+LightGBM method. This model effectively determines whether T2DM patients have CHD.
创建时间:
2025-05-22
二维码
社区交流群
二维码
科研交流群
商业服务