Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox


Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

Sep 20, 2020
Mengfan Xu, Diego Klabjan



EXP-based algorithms are often used for exploration in multi-armed bandit. We revisit the EXP3.P algorithm and establish both the lower and upper bounds of regret in the Gaussian multi-armed bandit setting, as well as a more general distribution option. The analyses do not require bounded rewards compared to classical regret assumptions. We also extend EXP4 from multi-armed bandit to reinforcement learning to incentivize exploration by multiple agents. The resulting algorithm has been tested on hard-to-explore games and it shows an improvement on exploration compared to state-of-the-art.

* 20 pages, 8 figures 


Share this with someone who'll enjoy it:

   Access Paper Source



Share this with someone who'll enjoy it: