five

Sparsity Oriented Importance Learning for High-Dimensional Linear Regression

收藏
DataCite Commons2020-09-01 更新2024-07-27 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Sparsity_Oriented_Importance_Learning_for_High-dimensional_Linear_Regression/5457409/2
下载链接
链接失效反馈
官方服务:
资源简介:
With now well-recognized nonnegligible model selection uncertainty, data analysts should no longer be satisfied with the output of a single final model from a model selection process, regardless of its sophistication. To improve reliability and reproducibility in model choice, one constructive approach is to make good use of a sound variable importance measure. Although interesting importance measures are available and increasingly used in data analysis, little theoretical justification has been done. In this article, we propose a new variable importance measure, sparsity oriented importance learning (SOIL), for high-dimensional regression from a sparse linear modeling perspective by taking into account the variable selection uncertainty via the use of a sensible model weighting. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. In particular, even if the signal is weak, SOIL rarely gives variables not in the true model significantly higher important values than those in the true model. Extensive simulations in several illustrative settings and real-data examples with guided simulations show desirable properties of the SOIL importance in contrast to other importance measures. Supplementary materials for this article are available online.

鉴于当前学界已普遍认可模型选择不确定性不容忽视,数据分析人员不应再满足于从单一模型选择流程中得到的单个最终模型输出,无论该模型的精细程度如何。为提升模型选择过程的可靠性与可复现性,一种可行的优化思路是合理运用可靠的变量重要性度量方法。尽管已有诸多实用的重要性度量方法问世并在数据分析中得到日益广泛的应用,但相关理论依据却较为匮乏。本文从稀疏线性建模视角出发,针对高维回归问题提出了一种全新的变量重要性度量方法——面向稀疏性的重要性学习(sparsity oriented importance learning, SOIL),该方法通过合理的模型加权方式考量变量选择不确定性。理论层面已证明,SOIL方法具备包含/排除特性:当模型权重合理贴合真实模型时,SOIL重要性得分能够有效区分真实模型中的变量与其余无关变量。尤为关键的是,即便效应信号微弱,SOIL方法也极少会给非真实模型中的变量赋予显著高于真实模型内变量的重要性得分。本文通过多组典型场景下的大规模仿真实验,以及结合引导式仿真的真实数据分析案例,验证了相较于其他现有重要性度量方法,SOIL重要性得分具备更优异的性能表现。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2018-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作