Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Jul 13, 2022

Stephen McAleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm

Figure 1 for Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Figure 2 for Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Figure 3 for Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Figure 4 for Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Share this with someone who'll enjoy it:

Abstract:In competitive two-agent environments, deep reinforcement learning (RL) methods based on the \emph{Double Oracle (DO)} algorithm, such as \emph{Policy Space Response Oracles (PSRO)} and \emph{Anytime PSRO (APSRO)}, iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might need to add all deterministic policies before converging. In this work, we introduce \emph{Self-Play PSRO (SP-PSRO)}, a method that adds an approximately optimal stochastic policy to the population in each iteration. Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well. As a result, SP-PSRO empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.

View paper on

Share this with someone who'll enjoy it:

Title:Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Paper and Code