Rank-Based Greedy Model Averaging for High-Dimensional Survival Data

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://figshare.com/articles/dataset/Rank-based_Greedy_Model_Averaging_for_High-Dimensional_Survival_Data/20085520

下载链接

链接失效反馈

官方服务：

资源简介：

Model averaging is an effective way to enhance prediction accuracy. However, most previous works focus on low-dimensional settings with completely observed responses. To attain an accurate prediction for the risk effect of survival data with high-dimensional predictors, we propose a novel method: rank-based greedy (RG) model averaging. Specifically, adopting the transformation model with splitting predictors as working models, we doubly use the smooth concordance index function to derive the candidate predictions and optimal model weights. The final prediction is achieved by weighted averaging all the candidates. Our approach is flexible, computationally efficient, and robust against model misspecification, as it neither requires the correctness of a joint model nor involves the estimation of the transformation function. We further adopt the greedy algorithm for high dimensions. Theoretically, we derive an asymptotic error bound for the optimal weights under some mild conditions. In addition, the summation of weights assigned to the correct candidate submodels is proven to approach one in probability when there are correct models included among the candidate submodels. Extensive numerical studies are carried out using both simulated and real datasets to show the proposed approach’s robust performance compared to the existing regularization approaches. Supplementary materials for this article are available online.

模型平均（model averaging）是提升预测精度的有效手段。然而，现有多数研究仅聚焦于响应变量完全观测的低维场景。针对带有高维预测变量的生存数据风险效应实现精准预测这一问题，本文提出一种新颖方法：基于秩的贪心（rank-based greedy, RG）模型平均。具体而言，本文采用以拆分预测变量构建的变换模型作为工作模型（working models），双重使用平滑一致性指数（smooth concordance index）函数推导候选预测结果与最优模型权重，最终预测通过对所有候选结果进行加权平均得到。所提方法灵活性高、计算效率优异且对模型误设具备强鲁棒性，既无需假设联合模型（joint model）的正确性，也无需估计变换函数。针对高维场景，本文进一步采用贪心算法（greedy algorithm）进行求解。理论层面，本文在若干温和条件下推导了最优权重的渐近误差界（asymptotic error bound）。此外，当候选子模型集合中包含正确模型时，分配至正确候选子模型的权重之和依概率收敛至1。本文通过大量模拟与真实数据集开展数值实验，结果表明相较于现有正则化方法（regularization approaches），所提方法展现出更优异的鲁棒性能。本文补充材料可在线获取。

创建时间：

2022-06-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集