Abstract:We study online learning in Bayesian Stackelberg games, where a leader repeatedly interacts with a follower whose unknown private type is independently drawn at each round from an unknown probability distribution. The goal is to design algorithms that minimize the leader's regret with respect to always playing an optimal commitment computed with knowledge of the game. We consider, for the first time to the best of our knowledge, the most realistic case in which the leader does not know anything about the follower's types, i.e., the possible follower payoffs. This raises considerable additional challenges compared to the commonly studied case in which the payoffs of follower types are known. First, we prove a strong negative result: no-regret is unattainable under action feedback, i.e., when the leader only observes the follower's best response at the end of each round. Thus, we focus on the easier type feedback model, where the follower's type is also revealed. In such a setting, we propose a no-regret algorithm that achieves a regret of $\widetilde{O}(\sqrt{T})$, when ignoring the dependence on other parameters.