Alert button

Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

Feb 15, 2021
Anshuka Rangi, Long Tran-Thanh, Haifeng Xu, Massimo Franceschetti

Figure 1 for Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification
Figure 2 for Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

Share this with someone who'll enjoy it:

This paper studies bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards, and can contaminate the rewards with additive noise. We show that \emph{any} bandit algorithm with regret $O(\log T)$ can be forced to suffer a regret $\Omega(T)$ with an expected amount of contamination $O(\log T)$. This amount of contamination is also necessary, as we prove that there exists an $O(\log T)$ regret bandit algorithm, specifically the classical UCB, that requires $\Omega(\log T)$ amount of contamination to suffer regret $\Omega(T)$. To combat such poising attacks, our second main contribution is to propose a novel algorithm, Secure-UCB, which uses limited \emph{verification} to access a limited number of uncontaminated rewards. We show that with $O(\log T)$ expected number of verifications, Secure-UCB can restore the order optimal $O(\log T)$ regret \emph{irrespective of the amount of contamination} used by the attacker. Finally, we prove that for any bandit algorithm, this number of verifications $O(\log T)$ is necessary to recover the order-optimal regret. We can then conclude that Secure-UCB is order-optimal in terms of both the expected regret and the expected number of verifications, and can save stochastic bandits from any data poisoning attack.

View paper onarxiv icon

Share this with someone who'll enjoy it: