We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game. Team members are not allowed to communicate during play but can coordinate before the game. In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game. In this paper, we first provide new modeling results about computing such an optimal distribution by drawing a connection to a different literature on extensive-form correlation. Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile. We can also cap the number of such profiles we allow in the solution. This begets an anytime algorithm by increasing the cap. We find that often a handful of well-chosen such profiles suffices to reach optimal utility for the team. This enables team members to reach coordination through a relatively simple and understandable plan. Finally, inspired by this observation and leveraging theoretical concepts that we introduce, we develop an efficient column-generation algorithm for finding an optimal distribution for the team. We evaluate it on a suite of common benchmark games. It is three orders of magnitude faster than the prior state of the art on games that the latter can solve and it can also solve several games that were previously unsolvable.
Recently, there has been growing interest around less-restrictive solution concepts than Nash equilibrium in extensive-form games, with significant effort towards the computation of extensive-form correlated equilibrium (EFCE) and extensive-form coarse correlated equilibrium (EFCCE). In this paper, we show how to leverage the popular counterfactual regret minimization (CFR) paradigm to induce simple no-regret dynamics that converge to the set of EFCEs and EFCCEs in an n-player general-sum extensive-form games. For EFCE, we define a notion of internal regret suitable for extensive-form games and exhibit an efficient no-internal-regret algorithm. These results complement those for normal-form games introduced in the seminal paper by Hart and Mas-Colell. For EFCCE, we show that no modification of CFR is needed, and that in fact the empirical frequency of play generated when all the players use the original CFR algorithm converges to the set of EFCCEs.
Network congestion games are a well-understood model of multi-agent strategic interactions. Despite their ubiquitous applications, it is not clear whether it is possible to design information structures to ameliorate the overall experience of the network users. We focus on Bayesian games with atomic players, where network vagaries are modeled via a (random) state of nature which determines the costs incurred by the players. A third-party entity---the sender---can observe the realized state of the network and exploit this additional information to send a signal to each player. A natural question is the following: is it possible for an informed sender to reduce the overall social cost via the strategic provision of information to players who update their beliefs rationally? The paper focuses on the problem of computing optimal ex ante persuasive signaling schemes, showing that symmetry is a crucial property for its solution. Indeed, we show that an optimal ex ante persuasive signaling scheme can be computed in polynomial time when players are symmetric and have affine cost functions. Moreover, the problem becomes NP-hard when players are asymmetric, even in non-Bayesian settings.
Persuasion studies how an informed principal may influence the behavior of agents by the strategic provision of payoff-relevant information. We focus on the fundamental multi-receiver model by Arieli and Babichenko (2019), in which there are no inter-agent externalities. Unlike prior works on this problem, we study the public persuasion problem in the general setting with: (i) arbitrary state spaces; (ii) arbitrary action spaces; (iii) arbitrary sender's utility functions. We fully characterize the computational complexity of computing a bi-criteria approximation of an optimal public signaling scheme. In particular, we show, in a voting setting of independent interest, that solving this problem requires at least a quasi-polynomial number of steps even in settings with a binary action space, assuming the Exponential Time Hypothesis. In doing so, we prove that a relaxed version of the Maximum Feasible Subsystem of Linear Inequalities problem requires at least quasi-polynomial time to be solved. Finally, we close the gap by providing a quasi-polynomial time bi-criteria approximation algorithm for arbitrary public persuasion problems that, in specific settings, yields a QPTAS.
Many real-world applications involve teams of agents that have to coordinate their actions to reach a common goal against potential adversaries. This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for example, in Bridge, collusion in poker, and collusion in bidding. The possibility for the team members to communicate before gameplay---that is, coordinate their strategies ex ante---makes the use of behavioral strategies unsatisfactory. We introduce Soft Team Actor-Critic (STAC) as a solution to the team's coordination problem that does not require any prior domain knowledge. STAC allows team members to effectively exploit ex ante communication via exogenous signals that are shared among the team. STAC reaches near-optimal coordinated strategies both in perfectly observable and partially observable games, where previous deep RL algorithms fail to reach optimal coordinated behaviors.
We focus on the following natural question: is it possible to influence the outcome of a voting process through the strategic provision of information to voters who update their beliefs rationally? We investigate whether it is computationally tractable to design a signaling scheme maximizing the probability with which the sender's preferred candidate is elected. We focus on the model recently introduced by Arieli and Babichenko (2019) (i.e., without inter-agent externalities), and consider, as explanatory examples, $k$-voting rule and plurality voting. There is a sharp contrast between the case in which private signals are allowed and the more restrictive setting in which only public signals are allowed. In the former, we show that an optimal signaling scheme can be computed efficiently both under a $k$-voting rule and plurality voting. In establishing these results, we provide two general (i.e., applicable to settings beyond voting) contributions. Specifically, we extend a well known result by Dughmi and Xu (2017) to more general settings, and prove that, when the sender's utility function is anonymous, computing an optimal signaling scheme is fixed parameter tractable w.r.t. the number of receivers' actions. In the public signaling case, we show that the sender's optimal expected return cannot be approximated to within any factor under a $k$-voting rule. This negative result easily extends to plurality voting and problems where utility functions are anonymous.
We study an information-structure design problem (a.k.a. persuasion) with a single sender and multiple receivers with actions of a priori unknown types, independently drawn from action-specific marginal distributions. As in the standard Bayesian persuasion model, the sender has access to additional information regarding the action types, which she can exploit when committing to a (noisy) signaling scheme through which she sends a private signal to each receiver. The novelty of our model is in considering the case where the receivers interact in a sequential game with imperfect information, with utilities depending on the game outcome and the realized action types. After formalizing the notions of ex ante and ex interim persuasiveness (which differ in the time at which the receivers commit to following the sender's signaling scheme), we investigate the continuous optimization problem of computing a signaling scheme which maximizes the sender's expected revenue. We show that computing an optimal ex ante persuasive signaling scheme is NP-hard when there are three or more receivers. In contrast with previous hardness results for ex interim persuasion, we show that, for games with two receivers, an optimal ex ante persuasive signaling scheme can be computed in polynomial time thanks to a novel algorithm based on the ellipsoid method which we propose.
We investigate the computation of equilibria in extensive-form games where ex ante correlation is possible, focusing on correlated equilibria requiring the least amount of communication between the players and the mediator. Motivated by the hardness results on the computation of normal-form correlated equilibria, we introduce the notion of normal-form coarse correlated equilibrium, extending the definition of coarse correlated equilibrium to sequential games. We show that, in two-player games without chance moves, an optimal (e.g., social welfare maximizing) normal-form coarse correlated equilibrium can be computed in polynomial time, and that in general multi-player games (including two-player games with Chance), the problem is NP-hard. For the former case, we provide a polynomial-time algorithm based on the ellipsoid method and also propose a more practical one, which can be efficiently applied to problems of considerable size. Then, we discuss how our algorithm can be extended to games with Chance and games with more than two players.