Variable selection and importance in presence of high collinearity: an application to the prediction of lean body mass from multi-frequency bioelectrical impedance
收藏DataCite Commons2022-05-11 更新2024-07-28 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Variable_selection_and_importance_in_presence_of_high_collinearity_an_application_to_the_prediction_of_lean_body_mass_from_multi-frequency_bioelectrical_impedance/12296036/1
下载链接
链接失效反馈官方服务:
资源简介:
In prediction problems both response and covariates may have high correlation with a second group of influential regressors, that can be considered as background variables. An important challenge is to perform variable selection and importance assessment among the covariates in the presence of these variables. A clinical example is the prediction of the lean body mass (response) from bioimpedance (covariates), where anthropometric measures play the role of background variables. We introduce a reduced dataset in which the variables are defined as the residuals with respect to the background, and perform variable selection and importance assessment both in linear and random forest models. Using a clinical dataset of multi-frequency bioimpedance, we show the effectiveness of this method to select the most relevant predictors of the lean body mass beyond anthropometry.
在预测任务中,响应变量(response)与协变量(covariates)均可能与另一组具有影响力的回归变量(regressors)存在较高相关性,此类变量可被视作背景变量(background variables)。在此类背景变量存在的前提下,实现协变量间的变量选择与重要性评估是一项关键挑战。一个典型临床实例为:以生物阻抗(bioimpedance,作为协变量)预测去脂体重(lean body mass,作为响应变量),其中人体测量指标(anthropometric measures)承担背景变量的角色。本研究提出一种精简数据集(reduced dataset),该数据集内的所有变量均定义为相对于背景变量的残差,并分别在线性模型与随机森林(random forest)模型中开展变量选择与重要性评估。借助多频生物阻抗(multi-frequency bioimpedance)临床数据集,本研究验证了该方法在筛选超越人体测量范畴的去脂体重关键预测因子方面的有效性。
提供机构:
Taylor & Francis
创建时间:
2020-05-13



