five

FAStEN: An Efficient Adaptive Method for Feature Selection and Estimation in High-Dimensional Functional Regressions

收藏
DataCite Commons2024-11-22 更新2024-11-06 收录
下载链接:
https://tandf.figshare.com/articles/dataset/FAStEN_An_Efficient_Adaptive_Method_for_Feature_Selection_and_Estimation_in_High-Dimensional_Functional_Regressions/27122532/1
下载链接
链接失效反馈
官方服务:
资源简介:
Functional regression analysis is an established tool for many contemporary scientific applications. Regression problems involving large and complex datasets are ubiquitous, and feature selection is crucial for avoiding overfitting and achieving accurate predictions. We propose a new, flexible and ultra-efficient approach to perform feature selection in a sparse high dimensional function-on-function regression problem, and we show how to extend it to the scalar-on-function framework. Our method, called FAStEN, combines functional data, optimization, and machine learning techniques to perform feature selection and parameter estimation simultaneously. We exploit the properties of Functional Principal Components and the sparsity inherent to the Dual Augmented Lagrangian problem to significantly reduce computational cost, and we introduce an adaptive scheme to improve selection accuracy. In addition, we derive asymptotic oracle properties, which guarantee estimation and selection consistency for the proposed FAStEN estimator. Through an extensive simulation study, we benchmark our approach to the best existing competitors and demonstrate a massive gain in terms of CPU time and selection performance, without sacrificing the quality of the coefficients’ estimation. The theoretical derivations and the simulation study provide a strong motivation for our approach. Finally, we present an application to brain fMRI data from the AOMIC PIOP1 study. Complete FAStEN code is provided at https://github.com/IBM/funGCN. Supplementary materials for this article are available online.

函数回归分析(Functional regression analysis)已是诸多当代科学应用中的成熟工具。涉及大规模复杂数据集的回归问题随处可见,而特征选择对于避免过拟合、获得精准预测结果至关重要。我们提出一种全新、灵活且超高效的方法,用于稀疏高维函数对函数回归(function-on-function regression)问题中的特征选择,并展示了如何将其拓展至标量对函数(scalar-on-function)框架中。我们将该方法命名为FAStEN,它融合了函数数据、优化技术与机器学习方法,可同时完成特征选择与参数估计。我们利用函数主成分(Functional Principal Components)的特性,以及对偶增广拉格朗日(Dual Augmented Lagrangian)问题固有的稀疏性,大幅降低了计算成本;同时引入自适应机制以提升选择精度。此外,我们推导了渐近神谕性质,可为所提出的FAStEN估计器的估计与选择一致性提供理论保障。通过大规模仿真实验,我们将所提方法与当前最优的同类方法进行了基准对比,结果显示,该方法在CPU运行时长与选择性能上均实现了大幅提升,且未牺牲系数估计的质量。理论推导与仿真实验为我们的方法提供了坚实的理论支撑与实践依据。最后,我们将所提方法应用于AOMIC PIOP1研究中的大脑功能磁共振成像(fMRI)数据。完整的FAStEN代码可在https://github.com/IBM/funGCN获取。本文的补充材料可在线查阅。
提供机构:
Taylor & Francis
创建时间:
2024-09-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作