Bayesian Approximate Kernel Regression With Variable Selection
收藏DataCite Commons2024-02-08 更新2024-07-25 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Bayesian_Approximate_Kernel_Regression_with_Variable_Selection/5325367
下载链接
链接失效反馈官方服务:
资源简介:
Nonlinear kernel regression models are often used in statistics and machine learning because they are more accurate than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this article, we propose a novel framework that provides an effect size analog for each explanatory variable in Bayesian kernel regression models when the kernel is shift-invariant—for example, the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that: (i) captures nonlinear structure, and (ii) can be projected onto the original explanatory variables. This projection onto the original explanatory variables serves as an analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion, we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. We illustrate the utility of BAKR by examining two important problems in statistical genetics: genomic selection (i.e., phenotypic prediction) and association mapping (i.e., inference of significant variants or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings. Supplementary materials for this article are available online.
非线性核回归模型在统计学与机器学习领域应用广泛,其预测精度优于线性模型,故得到大量采用。核回归模型的变量选择始终是学界面临的一大难题,究其原因之一在于:与线性回归场景不同,核回归系数并无明确的效应量(effect size)概念。本文提出一种全新框架,当核函数为平移不变核(如高斯核(Gaussian kernel))时,该框架可为贝叶斯核回归模型中的各解释变量提供等效效应量。我们借助平移不变再生核希尔伯特空间(reproducing kernel Hilbert spaces,缩写RKHS)的泛函分析性质,构建了一类线性向量空间:该空间既可捕捉非线性结构,亦可投影至原始解释变量空间。该投影过程即可作为效应量的等效替代指标。我们所采用的特定泛函分析性质为:平移不变核函数可通过随机傅里叶基进行近似。基于随机傅里叶展开,我们提出了一类计算高效的贝叶斯近似核回归(BAKR, Bayesian approximate kernel regression)模型,可同时适配非线性回归与二分类任务,且能在此类模型中计算效应量的等效替代指标。我们通过统计遗传学中的两类核心问题验证了BAKR的实用价值:基因组选择(即表型预测)与关联定位(即显著变异位点或基因座的推断)。当前主流的基因组选择与关联定位方法,分别基于核回归模型与线性模型构建。BAKR是首款可在两类任务中均具备竞争力的方法。本文补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2017-08-18



