Comparing LASSO and IPF-LASSO for multi-modal data: variable selection with Type I error control

Figshare2025-04-07 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Comparing_LASSO_and_IPF-LASSO_for_multi-modal_data_variable_selection_with_Type_I_error_control/28743185

下载链接

链接失效反馈

官方服务：

资源简介：

Variable selection in high-dimensional regression models is challenging. Thus, developing stable and reliable methods for variable selection is essential. Omics data, a common source of high-dimensional data, brings the added complexity of integrating diverse genomic layers into the analysis. The IPF-LASSO model has previously addressed this by applying distinct penalty parameters for each data modality. However, incorporating heterogeneous data layers into variable selection with Type I error control remains an open problem. To address this, we applied stability selection to control the number of false positives in both IPF-LASSO and standard LASSO models. Our study aimed to compare the two methods, investigating whether introducing different penalty parameters per data modality enhances statistical power while controlling false positives. Two high-dimensional data structures were investigated in simulations, one with independent data and the other with correlated data. We also applied the models to breast cancer treatment data, where IPF-LASSO identified relevant clinical variables.

高维回归模型中的变量选择是一项极具挑战性的任务。因此，开发稳定且可靠的变量选择方法至关重要。组学数据（Omics data）作为高维数据的常见来源，还带来了额外的分析复杂性——需要在研究中整合多种不同的基因组层级。此前提出的IPF-LASSO模型通过为每种数据模态（data modality）设置差异化的惩罚参数，解决了这一问题。然而，在控制一类错误（Type I error）的前提下，将异质数据层级纳入变量选择框架仍是一个尚未解决的开放性问题。为解决这一问题，我们采用稳定性选择（stability selection）方法，对IPF-LASSO与标准LASSO两种模型的假阳性（false positives）数量进行控制。本研究旨在对比这两种方法，探究为每种数据模态设置差异化惩罚参数，是否能够在控制假阳性的同时提升统计效力（statistical power）。模拟实验中共设置了两种高维数据结构：一种为独立数据，另一种为相关数据。我们还将两种模型应用于乳腺癌治疗数据集，其中IPF-LASSO成功识别出了具有临床意义的相关变量。

创建时间：

2025-04-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集