A Generic Sure Independence Screening Procedure
收藏Taylor & Francis Group2024-08-07 更新2026-04-16 收录
下载链接:
https://tandf.figshare.com/articles/dataset/A_Generic_Sure_Independence_Screening_Procedure/6201065/1
下载链接
链接失效反馈官方服务:
资源简介:
Extracting important features from ultra-high dimensional data is one of the primary tasks in statistical learning, information theory, precision medicine, and biological discovery. Many of the sure independent screening methods developed to meet these needs are suitable for special models under some assumptions. With the availability of more data types and possible models, a model-free generic screening procedure with fewer and less restrictive assumptions is desirable. In this article, we propose a generic nonparametric sure independence screening procedure, called BCor-SIS, on the basis of a recently developed universal dependence measure: Ball correlation. We show that the proposed procedure has strong screening consistency even when the dimensionality is an exponential order of the sample size without imposing sub-exponential moment assumptions on the data. We investigate the flexibility of this procedure by considering three commonly encountered challenging settings in biological discovery or precision medicine: iterative BCor-SIS, interaction pursuit, and survival outcomes. We use simulation studies and real data analyses to illustrate the versatility and practicability of our BCor-SIS method. Supplementary materials for this article are available online.
从超高维数据中提取关键特征,是统计学习、信息论、精准医学与生物发现领域的核心任务之一。为满足此类需求所开发的诸多确定独立筛选(sure independent screening)方法,仅在特定假设下适用于特定模型。随着更多数据类型与潜在模型的涌现,亟需一种假设更少、约束更宽松的无模型通用筛选方法。本文基于新近提出的通用相依性度量——Ball相关系数(Ball correlation),提出一种名为BCor-SIS的非参数通用确定独立筛选方法。研究表明,即便数据维度为样本量的指数阶,且无需对数据施加亚指数矩假设,所提方法仍具备优异的筛选一致性。本文针对生物发现与精准医学中三类常见的挑战性场景——迭代BCor-SIS、交互效应筛选以及生存结局分析,验证了该方法的灵活性。通过模拟实验与真实数据分析,本文展示了BCor-SIS方法的通用性与实用性。本文的补充材料可在线获取。
提供机构:
Wang, Xueqin; Zhu, Hongtu; Pan, Wenliang; Xiao, Weinan
创建时间:
2018-04-30



