five

Complete dataset obtained by filtering.

收藏
Figshare2025-09-24 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Complete_dataset_obtained_by_filtering_/30200916
下载链接
链接失效反馈
官方服务:
资源简介:
In response to Taiwan’s rapidly aging population and the rising demand for personalized health care, accurately assessing individual physiological aging has become an essential area of study. This research utilizes health examination data to propose a machine learning-based biological age prediction model that quantifies physiological age through residual life estimation. The model leverages LightGBM, which shows an 11.40% improvement in predictive performance (R-squared) compared to the XGBoost model. In the experiments, the use of MICE imputation for missing data significantly enhanced prediction accuracy, resulting in a 23.35% improvement in predictive performance. Kaplan-Meier (K-M) estimator survival analysis revealed that the model effectively differentiates between groups with varying health levels, underscoring the validity of biological age as a health status indicator. Additionally, the model identified the top ten biomarkers most influential in aging for both men and women, with a 69.23% overlap with Taiwan’s leading causes of death and previously identified top health-impact factors, further validating its practical relevance. Through multidimensional health recommendations based on SHAP and PCC interpretations, if the health recommendations provided by the model are implemented, 64.58% of individuals could potentially extend their life expectancy. This study provides new methodological support and data backing for precision health interventions and life extension.

针对台湾地区人口快速老龄化与个性化医疗需求日益增长的现状,精准评估个体生理衰老状态已成为重要研究方向。本研究利用健康检查数据,提出了一种基于机器学习的生物年龄预测模型,该模型通过剩余寿命估计实现生理年龄的量化。该模型采用轻量级梯度提升机(LightGBM),相较于极端梯度提升树(XGBoost)模型,其预测性能(决定系数R²)提升了11.40%。实验阶段,采用链式多重插补(MICE)处理缺失值可显著提升预测精度,使模型预测性能提升23.35%。通过卡普兰-迈耶(K-M)估计量开展生存分析,结果显示该模型可有效区分不同健康水平的群体,印证了生物年龄作为健康状态评估指标的有效性。此外,该模型分别识别出对男性与女性衰老影响最大的前10种生物标志物,其中69.23%的标志物与台湾地区主要致死病因及此前公认的健康影响因子重合,进一步验证了其实际应用价值。基于SHAP值与皮尔逊相关系数(PCC)的解释结果生成多维健康建议,若落实该模型提供的健康干预方案,64.58%的受试者有望延长预期寿命。本研究为精准健康干预与寿命延长提供了全新的方法学支撑与数据依据。
创建时间:
2025-09-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作