Split Regularized Regression

Name: Split Regularized Regression
Creator: Taylor & Francis
Published: 2021-09-29 16:44:15
License: 暂无描述

DataCite Commons2021-09-29 更新2024-07-27 收录

下载链接：

https://tandf.figshare.com/articles/dataset/Split_Regularized_Regression/8865995

下载链接

链接失效反馈

官方服务：

资源简介：

We propose an approach for fitting linear regression models that splits the set of covariates into groups. The optimal split of the variables into groups and the regularized estimation of the regression coefficients are performed by minimizing an objective function that encourages sparsity within each group and diversity among them. The estimated coefficients are then pooled together to form the final fit. Our procedure works on top of a given penalized linear regression estimator (e.g., Lasso, elastic net) by fitting it to possibly overlapping groups of features, encouraging diversity among these groups to reduce the correlation of the corresponding predictions. For the case of two groups, elastic net penalty and orthogonal predictors, we give a closed form solution for the regression coefficients in each group. We establish the consistency of our method with the number of predictors possibly increasing with the sample size. An extensive simulation study and real-data applications show that in general the proposed method improves the prediction accuracy of the base estimator used in the procedure. Possible extensions to GLMs and other models are discussed. The supplemental material for this article, available online, contains the proofs of our theoretical results and the full results of our simulation study.

本文提出一种将协变量（covariates）集合划分为若干组的线性回归模型（linear regression models）拟合方法。通过最小化一个同时鼓励组内稀疏性与组间多样性的目标函数，完成变量的最优分组与回归系数（regression coefficients）的正则化估计。随后将估计得到的系数合并，得到最终的拟合结果。所提方法基于指定的惩罚线性回归估计器（penalized linear regression estimator）（如套索回归（Lasso）、弹性网（elastic net）），将其应用于可能存在重叠的特征组，通过鼓励组间多样性以降低对应预测结果的相关性。针对两组划分、弹性网惩罚与正交预测变量的场景，本文推导得到了各组回归系数的闭式解（closed form solution）。本文证明了所提方法的一致性（consistency），其中预测变量的数量可随样本量（sample size）的增大而增加。大量模拟研究（simulation study）与实际数据应用结果表明，总体而言所提方法可提升流程中所用基础估计器（base estimator）的预测精度（prediction accuracy）。本文还讨论了针对广义线性模型（Generalized Linear Models，GLMs）与其他模型的扩展方向。本文的在线补充材料（supplemental material）包含理论结果的证明过程与模拟研究的完整结果。

提供机构：

Taylor & Francis

创建时间：

2019-07-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集