five

Machine learning algorithms identify hypokalaemia risk in people with hypertension in the United States National Health and Nutrition Examination Survey 1999–2018

收藏
DataCite Commons2023-05-18 更新2024-08-26 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Machine_learning_algorithms_identify_hypokalaemia_risk_in_people_with_hypertension_in_the_United_States_National_Health_and_Nutrition_Examination_Survey_1999_2018/22926702/1
下载链接
链接失效反馈
官方服务:
资源简介:
Hypokalaemia is a side-effect of diuretics. We aimed to use machine learning to identify features predicting hypokalaemia risk in hypertensive patients. Participants with hypertension in the United States National Health and Nutrition Examination Survey 1999–2018 were included for analysis. To select the most suitable algorithm, we tested and evaluated five machine learning algorithms commonly employed in epidemiological studies: Logistic Regression, k-Nearest Neighbor, Random Forest, Recursive Partitioning and Regression Trees, and eXtreme Gradient Boosting. These algorithms were accessed using a set of 38 screened features. We then selected the key hypokalaemia-associated features in the hypertension group and their cardiovascular diseases (CVD) subgroup using the SHapley Additive exPlanations (SHAP) values. Using SHAP values, the key features and their impact pattern on hypokalaemia risk were determined. A total of 25,326 hypertensive participants were included for analysis, of whom 4,511 had known CVD. The Random Forest algorithm had the highest AUROC (hypertension dataset: 0.73 [95%CI, 0.71–0.76]; CVD subgroup: 0.72 [95%CI, 0.66–0.78]). Moreover, the nomogram based on the top twelve key features screened by random forest retained good performance: age, sex, race, poverty income ratio, body mass index, systolic and diastolic blood pressure, non-potassium-sparing diuretics use and duration, renin-angiotensin blockers use and duration, and CVD history in hypertension dataset; while in CVD subgroup, the additional key features were comorbid diabetes, education level, smoking status, and use of bronchodilators. Our predictive model based on the random forest algorithm performed best among the tested and evaluated five algorithms. Hypokalaemia-associated key features have been identified in hypertensive patients and the subgroup with CVD. These findings from machine learning facilitate the development of artificial intelligence to highlight hypokalaemia risk in hypertension patients.Key messages:Our predictive model based on the random forest algorithm performed best among the tested and evaluated five algorithms, and hypokalemia-associated key features have been identified in hypertensive patients and the subgroup with cardiovascular disease.The nomogram we developed including twelve key features might be useful and applied in primary clinical consultations to identify the hypertensive patients at risk of hypokalaemia.These findings from machine learning facilitate the development of artificial intelligence to highlight hypokalaemia risk in hypertension patients Our predictive model based on the random forest algorithm performed best among the tested and evaluated five algorithms, and hypokalemia-associated key features have been identified in hypertensive patients and the subgroup with cardiovascular disease. The nomogram we developed including twelve key features might be useful and applied in primary clinical consultations to identify the hypertensive patients at risk of hypokalaemia. These findings from machine learning facilitate the development of artificial intelligence to highlight hypokalaemia risk in hypertension patients

低钾血症(Hypokalaemia)是利尿剂(diuretics)的不良反应之一。本研究旨在借助机器学习技术,识别高血压患者发生低钾血症风险的预测特征。本研究纳入1999-2018年美国国家健康与营养检查调查(United States National Health and Nutrition Examination Survey, NHANES)中的高血压患者作为分析对象。为筛选最优算法,我们测试并评估了流行病学研究中常用的5种机器学习算法:逻辑回归(Logistic Regression)、k近邻(k-Nearest Neighbor, k-NN)、随机森林(Random Forest)、递归分割与回归树(Recursive Partitioning and Regression Trees)以及极端梯度提升(eXtreme Gradient Boosting, XGBoost)。本研究以经筛选后的38项特征作为输入变量,对上述算法进行验证。随后,我们借助SHapley加性解释(SHapley Additive exPlanations, SHAP)值,分别筛选出高血压组及其心血管疾病(Cardiovascular Diseases, CVD)亚组中与低钾血症相关的关键特征。基于SHAP值,本研究明确了关键特征及其对低钾血症风险的影响模式。本研究最终纳入25326名高血压患者进行分析,其中4511名合并已知心血管疾病。随机森林算法的受试者工作特征曲线下面积(AUROC)最高:高血压数据集为0.73[95%CI, 0.71–0.76];心血管疾病亚组为0.72[95%CI, 0.66–0.78]。此外,基于随机森林筛选出的前12项关键特征构建的列线图(nomogram)仍保持良好的预测性能:在高血压数据集中,特征包括年龄、性别、种族、贫困收入比、体质量指数(Body Mass Index, BMI)、收缩压与舒张压、非保钾利尿剂使用情况及疗程、肾素-血管紧张素受体阻滞剂使用情况及疗程,以及心血管疾病病史;而在心血管疾病亚组中,新增的关键特征包括合并糖尿病、受教育程度、吸烟状态以及支气管扩张剂(bronchodilators)使用情况。本研究基于随机森林算法构建的预测模型,在5种受试评估算法中表现最优;同时明确了高血压患者及合并心血管疾病亚组中与低钾血症相关的关键特征。本研究的机器学习分析结果有助于推动人工智能(Artificial Intelligence, AI)技术的发展,以更好地识别高血压患者的低钾血症风险。 核心要点: 1. 本研究基于随机森林算法构建的预测模型,在5种受试评估算法中表现最优;同时明确了高血压患者及合并心血管疾病亚组中与低钾血症相关的关键特征。 2. 本研究基于12项关键特征构建的列线图具有临床应用价值,可在基层临床诊疗中用于识别低钾血症高风险高血压患者。 3. 本研究的机器学习分析结果有助于推动人工智能技术的发展,以更好地识别高血压患者的低钾血症风险。 本研究基于随机森林算法构建的预测模型,在5种受试评估算法中表现最优;同时明确了高血压患者及合并心血管疾病亚组中与低钾血症相关的关键特征。本研究基于12项关键特征构建的列线图具有临床应用价值,可在基层临床诊疗中用于识别低钾血症高风险高血压患者。本研究的机器学习分析结果有助于推动人工智能技术的发展,以更好地识别高血压患者的低钾血症风险
提供机构:
Taylor & Francis
创建时间:
2023-05-18
二维码
社区交流群
二维码
科研交流群
商业服务