Statistically Efficient Thinning of a Markov Chain Sampler

Name: Statistically Efficient Thinning of a Markov Chain Sampler
Creator: Taylor & Francis
Published: 2020-09-01 13:04:43
License: 暂无描述

DataCite Commons2020-09-01 更新2024-08-17 收录

下载链接：

https://tandf.figshare.com/articles/dataset/Statistically_Efficient_Thinning_of_a_Markov_Chain_Sampler/5266405/2

下载链接

链接失效反馈

官方服务：

资源简介：

It is common to subsample Markov chain output to reduce the storage burden. Geyer shows that discarding k − 1 out of every k observations will not improve statistical efficiency, as quantified through variance in a given computational budget. That observation is often taken to mean that thinning Markov chain Monte Carlo (MCMC) output cannot improve statistical efficiency. Here, we suppose that it costs one unit of time to advance a Markov chain and then θ > 0 units of time to compute a sampled quantity of interest. For a thinned process, that cost θ is incurred less often, so it can be advanced through more stages. Here, we provide examples to show that thinning will improve statistical efficiency if θ is large and the sample autocorrelations decay slowly enough. If the lag ℓ ⩾ 1 autocorrelations of a scalar measurement satisfy ρℓ > ρℓ + 1 > 0, then there is always a θ < ∞ at which thinning becomes more efficient for averages of that scalar. Many sample autocorrelation functions resemble first order AR(1) processes with ρℓ = ρ|ℓ| for some − 1 < ρ < 1. For an AR(1) process, it is possible to compute the most efficient subsampling frequency k. The optimal k grows rapidly as ρ increases toward 1. The resulting efficiency gain depends primarily on θ, not ρ. Taking k = 1 (no thinning) is optimal when ρ ⩽ 0. For ρ > 0, it is optimal if and only if θ ⩽ (1 − ρ)2/(2ρ). This efficiency gain never exceeds 1 + θ. This article also gives efficiency bounds for autocorrelations bounded between those of two AR(1) processes. Supplementary materials for this article are available online.

为降低存储负载，对马尔可夫链（Markov chain）输出进行子采样是学界常见操作。盖耶（Geyer）的研究证实，在每k个观测值中舍弃k−1个，在给定计算预算下以方差量化的统计效率并不会得到提升。这一结论常被理解为：对马尔可夫链蒙特卡洛（Markov chain Monte Carlo, MCMC）输出实施减薄采样无法提升统计效率。本文假设：推进一轮马尔可夫链耗时1个单位时间，而计算单个目标采样量则需θ>0个单位时间。对于实施了减薄采样的流程，计算成本θ的发生频率更低，因此马尔可夫链可完成更多迭代次数。本文通过示例证明：当θ较大且样本自相关函数衰减足够缓慢时，减薄采样可有效提升统计效率。若某标量测量的滞后ℓ≥1阶自相关满足ρ_ℓ > ρ_{ℓ+1} > 0，则总能找到有限θ值，使得针对该标量的均值估计，减薄采样的效率更优。诸多样本自相关函数与一阶自回归（AR(1)）过程形式相似，即ρ_ℓ = ρ^{|ℓ|}，其中−1<ρ<1。针对AR(1)过程，可计算得到最优子采样频率k。最优k值随ρ趋近于1而快速增大。由此带来的效率增益主要取决于θ，而非ρ。当ρ≤0时，k=1（即无减薄采样）为最优选择；当ρ>0时，当且仅当θ≤(1−ρ)²/(2ρ)时，k=1才是最优解。该效率增益始终不超过1+θ。本文还给出了自相关函数介于两类AR(1)过程自相关之间情形下的效率界。本文的补充材料可在线获取。

提供机构：

Taylor & Francis

创建时间：

2019-10-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集