In this paper, we study a class of online optimization problems with long-term budget constraints where the objective functions are not necessarily concave (nor convex) but they instead satisfy the Diminishing Returns (DR) property. Specifically, a sequence of monotone DR-submodular objective functions $\{f_t(x)\}_{t=1}^T$ and monotone linear budget functions $\{\langle p_t,x \rangle \}_{t=1}^T$ arrive over time and assuming a total targeted budget $B_T$, the goal is to choose points $x_t$ at each time $t\in\{1,\dots,T\}$, without knowing $f_t$ and $p_t$ on that step, to achieve sub-linear regret bound while the total budget violation $\sum_{t=1}^T \langle p_t,x_t \rangle -B_T$ is sub-linear as well. Prior work has shown that achieving sub-linear regret is impossible if the budget functions are chosen adversarially. Therefore, we modify the notion of regret by comparing the agent against a $(1-\frac{1}{e})$-approximation to the best fixed decision in hindsight which satisfies the budget constraint proportionally over any window of length $W$. We propose the Online Saddle Point Hybrid Gradient (OSPHG) algorithm to solve this class of online problems. For $W=T$, we recover the aforementioned impossibility result. However, when $W=o(T)$, we show that it is possible to obtain sub-linear bounds for both the $(1-\frac{1}{e})$-regret and the total budget violation.

Share this with someone who'll enjoy it:

Access Paper Source