Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Delayed Bandit Online Learning with Unknown Delays

Jul 09, 2018

Bingcong Li, Tianyi Chen, Georgios B. Giannakis

Figure 1 for Delayed Bandit Online Learning with Unknown Delays

Figure 2 for Delayed Bandit Online Learning with Unknown Delays

Figure 3 for Delayed Bandit Online Learning with Unknown Delays

Figure 4 for Delayed Bandit Online Learning with Unknown Delays

Share this with someone who'll enjoy it:

Abstract:This paper studies bandit learning problems with delayed feedback, which included multi-armed bandit (MAB) and bandit convex optimization (BCO). Given only function value information (a.k.a. bandit feedback), algorithms for both MAB and BCO typically rely on (possibly randomized) gradient estimators based on function values, and then feed them into well-studied gradient-based algorithms. Different from existing works however, the setting considered here is more challenging, where the bandit feedback is not only delayed but also the presence of its delay is not revealed to the learner. Existing algorithms for delayed MAB and BCO become intractable in this setting. To tackle such challenging settings, DEXP3 and DBGD have been developed for MAB and BCO, respectively. Leveraging a unified analysis framework, it is established that both DEXP3 and DBGD guarantee an ${\cal O}\big( \sqrt{T+D} \big)$ regret over $T$ time slots with $D$ being the overall delay accumulated over slots. The new regret bounds match those in full information settings.

* 22 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:Delayed Bandit Online Learning with Unknown Delays

Paper and Code