Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent
收藏Figshare2026-01-30 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Online_Statistical_Inference_for_Contextual_Bandits_via_Stochastic_Gradient_Descent/31211937
下载链接
链接失效反馈官方服务:
资源简介:
With the fast development of big data, learning the optimal decision rule by recursively updating it and making online decisions has been easier than before. We study the online statistical inference of model parameters in a contextual bandit framework of sequential decision-making. We propose a general framework for an online and adaptive data collection environment that can update decision rules via weighted stochastic gradient descent. We allow different weighting schemes of the stochastic gradient and establish the asymptotic normality of the parameter estimator. Our proposed estimator significantly improves the asymptotic efficiency over the previous averaged SGD approach via inverse probability weights. We also conduct an optimality analysis on the weights in a linear regression setting. We provide a Bahadur representation of the proposed estimator and show that the remainder term in the Bahadur representation entails a slower convergence rate compared to classical SGD due to the adaptive data collection. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
创建时间:
2026-01-30



