five

Asynchronous and Distributed Data Augmentation for Massive Data Settings

收藏
Figshare2022-10-03 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Asynchronous_and_Distributed_Data_Augmentation_for_Massive_Data_Settings/21263191
下载链接
链接失效反馈
官方服务:
资源简介:
Data augmentation (DA) algorithms are slow in massive data settings due to multiple passes through the entire data. We address this problem by developing a DA extension that exploits asynchronous and distributed computing. The extended DA algorithm is called Asynchronous and Distributed (AD) DA with the original DA as its parent. Any ADDA is indexed by a parameter r∈(0,1) and starts by dividing the entire data into k disjoint subsets and storing them on k processes. Every iteration of ADDA augments only an r-fraction of the k data subsets with some positive probability and leaves the remaining (1−r)-fraction of the augmented data unchanged. The parameter draws are obtained using the r-fraction of new and (1−r)-fraction of old augmented data. We show that the ADDA Markov chain is Harris ergodic with the desired stationary distribution under mild conditions on the parent DA algorithm. We demonstrate that ADDA is significantly faster than its parent for many (k, r) choices in three representative models. We also establish the geometric ergodicity of the ADDA Markov chain for all the three models, which yields asymptotically valid standard errors for estimates of desired posterior quantities. Supplementary materials for this article are available online.
创建时间:
2022-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作