Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A One-Size-Fits-All Solution to Conservative Bandit Problems

Dec 16, 2020

Yihan Du, Siwei Wang, Longbo Huang

Figure 1 for A One-Size-Fits-All Solution to Conservative Bandit Problems

Figure 2 for A One-Size-Fits-All Solution to Conservative Bandit Problems

Figure 3 for A One-Size-Fits-All Solution to Conservative Bandit Problems

Figure 4 for A One-Size-Fits-All Solution to Conservative Bandit Problems

Share this with someone who'll enjoy it:

Abstract:In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's reward performance must be at least as well as a given baseline at any time. We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems, i.e. conservative multi-armed bandits (CMAB), conservative linear bandits (CLB) and conservative contextual combinatorial bandits (CCCB). Different from previous works which consider high probability constraints on the expected reward, we focus on a sample-path constraint on the actually received reward, and achieve better theoretical guarantees ($T$-independent additive regrets instead of $T$-dependent) and empirical performance. Furthermore, we extend the results and consider a novel conservative mean-variance bandit problem (MV-CBP), which measures the learning performance with both the expected reward and variability. For this extended problem, we provide a novel algorithm with $O(1/T)$ normalized additive regrets ($T$-independent in the cumulative form) and validate this result through empirical evaluation.

View paper on

Share this with someone who'll enjoy it:

Title:A One-Size-Fits-All Solution to Conservative Bandit Problems

Paper and Code