Asynchronous and Distributed Data Augmentation for Massive Data Settings

Figshare2022-10-03 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Asynchronous_and_Distributed_Data_Augmentation_for_Massive_Data_Settings/21263191

下载链接

链接失效反馈

官方服务：

资源简介：

Data augmentation (DA) algorithms are slow in massive data settings due to multiple passes through the entire data. We address this problem by developing a DA extension that exploits asynchronous and distributed computing. The extended DA algorithm is called Asynchronous and Distributed (AD) DA with the original DA as its parent. Any ADDA is indexed by a parameter r∈(0,1) and starts by dividing the entire data into k disjoint subsets and storing them on k processes. Every iteration of ADDA augments only an r-fraction of the k data subsets with some positive probability and leaves the remaining (1−r)-fraction of the augmented data unchanged. The parameter draws are obtained using the r-fraction of new and (1−r)-fraction of old augmented data. We show that the ADDA Markov chain is Harris ergodic with the desired stationary distribution under mild conditions on the parent DA algorithm. We demonstrate that ADDA is significantly faster than its parent for many (k, r) choices in three representative models. We also establish the geometric ergodicity of the ADDA Markov chain for all the three models, which yields asymptotically valid standard errors for estimates of desired posterior quantities. Supplementary materials for this article are available online.

创建时间：

2022-10-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集