Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

Sep 20, 2020
Mengfan Xu, Diego Klabjan

EXP-based algorithms are often used for exploration in multi-armed bandit. We revisit the EXP3.P algorithm and establish both the lower and upper bounds of regret in the Gaussian multi-armed bandit setting, as well as a more general distribution option. The analyses do not require bounded rewards compared to classical regret assumptions. We also extend EXP4 from multi-armed bandit to reinforcement learning to incentivize exploration by multiple agents. The resulting algorithm has been tested on hard-to-explore games and it shows an improvement on exploration compared to state-of-the-art.

* 20 pages, 8 figures 

Share this with someone who'll enjoy it:

   Access Paper Source

Share this with someone who'll enjoy it: