Picture for Tuomas Sandholm

Tuomas Sandholm

Imperfect-Recall Games: Equilibrium Concepts and Their Complexity

Add code
Jun 23, 2024
Viaarxiv icon

AlphaZeroES: Direct score maximization outperforms planning loss minimization

Add code
Jun 12, 2024
Viaarxiv icon

Scalable Mechanism Design for Multi-Agent Path Finding

Add code
Jan 30, 2024
Figure 1 for Scalable Mechanism Design for Multi-Agent Path Finding
Figure 2 for Scalable Mechanism Design for Multi-Agent Path Finding
Figure 3 for Scalable Mechanism Design for Multi-Agent Path Finding
Figure 4 for Scalable Mechanism Design for Multi-Agent Path Finding
Viaarxiv icon

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

Add code
Dec 21, 2023
Viaarxiv icon

Confronting Reward Model Overoptimization with Constrained RLHF

Add code
Oct 10, 2023
Figure 1 for Confronting Reward Model Overoptimization with Constrained RLHF
Figure 2 for Confronting Reward Model Overoptimization with Constrained RLHF
Figure 3 for Confronting Reward Model Overoptimization with Constrained RLHF
Figure 4 for Confronting Reward Model Overoptimization with Constrained RLHF
Viaarxiv icon

Planning in the imagination: High-level planning on learned abstract search spaces

Add code
Aug 16, 2023
Figure 1 for Planning in the imagination: High-level planning on learned abstract search spaces
Figure 2 for Planning in the imagination: High-level planning on learned abstract search spaces
Figure 3 for Planning in the imagination: High-level planning on learned abstract search spaces
Figure 4 for Planning in the imagination: High-level planning on learned abstract search spaces
Viaarxiv icon

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

Add code
Jul 22, 2023
Figure 1 for Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Figure 2 for Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Figure 3 for Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Figure 4 for Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Viaarxiv icon

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Add code
Jan 26, 2023
Figure 1 for On the Convergence of No-Regret Learning Dynamics in Time-Varying Games
Figure 2 for On the Convergence of No-Regret Learning Dynamics in Time-Varying Games
Figure 3 for On the Convergence of No-Regret Learning Dynamics in Time-Varying Games
Figure 4 for On the Convergence of No-Regret Learning Dynamics in Time-Varying Games
Viaarxiv icon

Computing equilibria by minimizing exploitability with best-response ensembles

Add code
Jan 20, 2023
Figure 1 for Computing equilibria by minimizing exploitability with best-response ensembles
Figure 2 for Computing equilibria by minimizing exploitability with best-response ensembles
Figure 3 for Computing equilibria by minimizing exploitability with best-response ensembles
Figure 4 for Computing equilibria by minimizing exploitability with best-response ensembles
Viaarxiv icon

Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks

Add code
Nov 29, 2022
Figure 1 for Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks
Figure 2 for Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks
Figure 3 for Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks
Figure 4 for Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks
Viaarxiv icon