five

Comparing LASSO and IPF-LASSO for multi-modal data: variable selection with Type I error control

收藏
Figshare2025-04-07 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Comparing_LASSO_and_IPF-LASSO_for_multi-modal_data_variable_selection_with_Type_I_error_control/28743185
下载链接
链接失效反馈
官方服务:
资源简介:
Variable selection in high-dimensional regression models is challenging. Thus, developing stable and reliable methods for variable selection is essential. Omics data, a common source of high-dimensional data, brings the added complexity of integrating diverse genomic layers into the analysis. The IPF-LASSO model has previously addressed this by applying distinct penalty parameters for each data modality. However, incorporating heterogeneous data layers into variable selection with Type I error control remains an open problem. To address this, we applied stability selection to control the number of false positives in both IPF-LASSO and standard LASSO models. Our study aimed to compare the two methods, investigating whether introducing different penalty parameters per data modality enhances statistical power while controlling false positives. Two high-dimensional data structures were investigated in simulations, one with independent data and the other with correlated data. We also applied the models to breast cancer treatment data, where IPF-LASSO identified relevant clinical variables.

高维回归模型中的变量选择是一项极具挑战性的任务。因此,开发稳定且可靠的变量选择方法至关重要。组学数据(Omics data)作为高维数据的常见来源,还带来了额外的分析复杂性——需要在研究中整合多种不同的基因组层级。此前提出的IPF-LASSO模型通过为每种数据模态(data modality)设置差异化的惩罚参数,解决了这一问题。然而,在控制一类错误(Type I error)的前提下,将异质数据层级纳入变量选择框架仍是一个尚未解决的开放性问题。为解决这一问题,我们采用稳定性选择(stability selection)方法,对IPF-LASSO与标准LASSO两种模型的假阳性(false positives)数量进行控制。本研究旨在对比这两种方法,探究为每种数据模态设置差异化惩罚参数,是否能够在控制假阳性的同时提升统计效力(statistical power)。模拟实验中共设置了两种高维数据结构:一种为独立数据,另一种为相关数据。我们还将两种模型应用于乳腺癌治疗数据集,其中IPF-LASSO成功识别出了具有临床意义的相关变量。
创建时间:
2025-04-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作