Conditional Sure Independence Screening
收藏NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://figshare.com/articles/dataset/Conditional_Sure_Independence_Screening/1581602
下载链接
链接失效反馈官方服务:
资源简介:
Independence screening is powerful for variable selection when the number of variables is massive. Commonly used independence screening methods are based on marginal correlations or its variants. When some prior knowledge on a certain important set of variables is available, a natural assessment on the relative importance of the other predictors is their conditional contributions to the response given the known set of variables. This results in conditional sure independence screening (CSIS). CSIS produces a rich family of alternative screening methods by different choices of the conditioning set and can help reduce the number of false positive and false negative selections when covariates are highly correlated. This article proposes and studies CSIS in generalized linear models. We give conditions under which sure screening is possible and derive an upper bound on the number of selected variables. We also spell out the situation under which CSIS yields model selection consistency and the properties of CSIS when a data-driven conditioning set is used. Moreover, we provide two data-driven methods to select the thresholding parameter of conditional screening. The utility of the procedure is illustrated by simulation studies and analysis of two real datasets. Supplementary materials for this article are available online.
当变量维度庞大时,独立筛选(independence screening)是变量选择的有效工具。现有常用的独立筛选方法多基于边际相关系数及其变体形式。若已掌握某类重要变量的先验知识,则评估其余预测变量相对重要性的自然思路,是考察其在给定已知变量集的条件下对响应变量的条件贡献。由此可导出条件确定型独立筛选(conditional sure independence screening, CSIS)。CSIS可通过选取不同的条件集构建丰富的备选筛选方法族,在协变量高度相关的场景下,能够有效降低假阳性与假阴性选择的比例。本文针对广义线性模型(generalized linear models)提出并系统研究了CSIS方法。我们给出了确保确定型筛选可行的前提条件,并推导了入选变量数量的上界。同时,我们明确了CSIS实现模型选择一致性的场景,以及采用数据驱动条件集时CSIS的相关性质。此外,本文还提出了两种数据驱动方法,用于选取条件筛选的阈值参数。通过模拟研究与两个真实数据集的分析,验证了所提流程的实用性。本文的补充材料可在线获取。
创建时间:
2015-10-15



