Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

Mar 16, 2023

Weitong Zhang, Jiafan He, Zhiyuan Fan, Quanquan Gu

Figure 1 for On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

Figure 2 for On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

Figure 3 for On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

Figure 4 for On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

Share this with someone who'll enjoy it:

Abstract:We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $\zeta>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $\zeta$ is dominated by $\tilde O (\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O (d^2/\Delta)$ as in the well-specified setting up to logarithmic factors. In addition, we show that an existing algorithm SupLinUCB (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap $\Delta$. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when $\zeta \leq \tilde O(\Delta / \sqrt{d})$; and (2) it is not efficiently learnable when $\zeta \geq \tilde \Omega({\Delta} / {\sqrt{d}})$. Experiments on both synthetic and real-world datasets corroborate our theoretical results.

* 28 pages, 2 figures, 2 tables

View paper on

Share this with someone who'll enjoy it:

Title:On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

Paper and Code