Divide and Recombine Approaches for Fitting Smoothing Spline Models with Large Datasets

Name: Divide and Recombine Approaches for Fitting Smoothing Spline Models with Large Datasets
Creator: Taylor & Francis
Published: 2020-08-31 12:26:29
License: 暂无描述

DataCite Commons2020-08-31 更新2024-07-25 收录

下载链接：

https://tandf.figshare.com/articles/Divide_and_Recombine_Approaches_for_Fitting_Smoothing_Spline_Models_with_Large_Datasets/5635045

下载链接

链接失效反馈

官方服务：

资源简介：

Spline smoothing is a widely used nonparametric method that allows data to speak for themselves. Due to its complexity and flexibility, fitting smoothing spline models is usually computationally intensive which may become prohibitive with large datasets. To overcome memory and CPU limitations, we propose four divide and recombine (D&R) approaches for fitting cubic splines with large datasets. We consider two approaches to divide the data: random and sequential. For each approach of division, we consider two approaches to recombine. These D&R approaches are implemented in parallel without communication. Extensive simulations show that these D&R approaches are scalable and have comparable performance as the method that uses the whole data. The sequential D&R approaches are spatially adaptive which lead to better performance than the method that uses the whole data when the underlying function is spatially inhomogeneous.

样条平滑（Spline smoothing）是一种被广泛应用的非参数方法，其核心思路是让数据自行揭示其内在规律。由于该方法兼具复杂性与灵活性，拟合平滑样条模型通常计算量极大，在处理大规模数据集时，其计算成本往往高到难以承受。为突破内存与中央处理器（Central Processing Unit，以下简称CPU）的性能限制，我们针对大规模数据集下的三次样条拟合任务，提出了四种分治重组（divide and recombine，以下简称D&R）方法。我们采用两种策略对数据进行划分：随机划分与序贯划分；针对每一种划分策略，我们分别设计了两种重组方案。上述D&R方法可在无需跨节点通信的情况下实现并行计算。大量模拟实验结果表明，此类D&R方法具备良好的可扩展性，且拟合性能与使用全量数据的方法相当。其中，序贯型D&R方法具备空间自适应特性，当待拟合的底层函数存在空间非均匀性时，其性能优于全量数据拟合方法。

提供机构：

Taylor & Francis

创建时间：

2017-11-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集