Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

May 24, 2019
Niladri S. Chatterji, Vidya Muthukumar, Peter L. Bartlett

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden \textit{simple multi-armed bandit} structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for their alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.

Share this with someone who'll enjoy it:

   Access Paper Source

Share this with someone who'll enjoy it: