five

Top 20 important factors.

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Top_20_important_factors_/27700162
下载链接
链接失效反馈
官方服务:
资源简介:
This paper proposes the use of machine learning models to predict one’s risk of having hypertension in the future using their routine health checkup data of their current and past visits to a health checkup center. The large-scale and high-dimensional dataset used in this study comes from MJ Health Research Foundation in Taiwan. The training data for models is separated into 5 folds and used to train 5 models in a 5-fold cross validation manner. While predicting the results for the test set, the voted result of 5 models is used as the final prediction. Experimental results show that our models achieve 69.59% of precision, 77.90% of recall, and 73.51% of F1-score, which outperforms a baseline using only the blood pressure of visitors’ last visits. Experiments also show that a visitor who performs a health checkup more often can be predicted better, and models trained with selected important factors achieve better results than those trained with Framingham risk score. We also demonstrate the possibility of using our models to suggest visitors for weight control by adding virtual visits that assume their body weight can be reduced in the near future to model input. Experimental results show that around 5.48% of the people who are with high Body Mass Index of the true positive cases are rejudged as negative, and a rising trend appears when adding more virtual visits, which may be used to suggest visitors that controlling their body weight for a longer time lead to lower probability of having hypertension in the future.

本研究提出利用受试者当前及既往于健康体检中心就诊的常规体检数据,预测其未来罹患高血压的风险。本研究使用的大规模高维数据集,源自中国台湾地区MJ健康研究基金会(MJ Health Research Foundation)。模型训练数据被划分为5折,以5折交叉验证的方式训练5个独立模型;在对测试集进行预测时,采用5个模型的投票结果作为最终预测输出。实验结果表明,本模型的精确率达69.59%、召回率达77.90%、F1值达73.51%,优于仅使用受试者末次就诊血压数据的基线模型。此外实验发现,体检频率越高的受试者,其高血压风险预测效果越好;且基于筛选出的关键特征训练的模型,性能优于采用弗雷明汉风险评分(Framingham Risk Score)训练的模型。本研究还验证了模型的拓展应用场景:通过向模型输入假设受试者近期可降低体重的虚拟就诊数据,可辅助为受试者提供体重管控建议。实验结果显示,在真阳性的高身体质量指数(Body Mass Index,BMI)受试者中,约5.48%的人群经虚拟就诊输入后被重新判定为阴性;且随着虚拟就诊次数增加,该重判比例呈上升趋势,这表明长期控制体重可有效降低受试者未来罹患高血压的风险,可据此为受试者提供针对性健康指导。
创建时间:
2024-11-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作