Alert button
Picture for Tuomas Sandholm

Tuomas Sandholm

Alert button

Scalable Mechanism Design for Multi-Agent Path Finding

Jan 30, 2024
Paul Friedrich, Yulun Zhang, Michael Curry, Ludwig Dierks, Stephen McAleer, Jiaoyang Li, Tuomas Sandholm, Sven Seuken

Viaarxiv icon

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

Dec 21, 2023
Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm

Viaarxiv icon

Confronting Reward Model Overoptimization with Constrained RLHF

Oct 10, 2023
Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen McAleer

Viaarxiv icon

Planning in the imagination: High-level planning on learned abstract search spaces

Aug 16, 2023
Carlos Martin, Tuomas Sandholm

Figure 1 for Planning in the imagination: High-level planning on learned abstract search spaces
Figure 2 for Planning in the imagination: High-level planning on learned abstract search spaces
Figure 3 for Planning in the imagination: High-level planning on learned abstract search spaces
Figure 4 for Planning in the imagination: High-level planning on learned abstract search spaces
Viaarxiv icon

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

Jul 22, 2023
Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Xiangyu Liu, Tuomas Sandholm, Furong Huang, Stephen McAleer

Figure 1 for Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Figure 2 for Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Figure 3 for Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Figure 4 for Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Viaarxiv icon

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Jan 26, 2023
Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm

Figure 1 for On the Convergence of No-Regret Learning Dynamics in Time-Varying Games
Figure 2 for On the Convergence of No-Regret Learning Dynamics in Time-Varying Games
Figure 3 for On the Convergence of No-Regret Learning Dynamics in Time-Varying Games
Figure 4 for On the Convergence of No-Regret Learning Dynamics in Time-Varying Games
Viaarxiv icon

Computing equilibria by minimizing exploitability with best-response ensembles

Jan 20, 2023
Carlos Martin, Tuomas Sandholm

Figure 1 for Computing equilibria by minimizing exploitability with best-response ensembles
Figure 2 for Computing equilibria by minimizing exploitability with best-response ensembles
Figure 3 for Computing equilibria by minimizing exploitability with best-response ensembles
Figure 4 for Computing equilibria by minimizing exploitability with best-response ensembles
Viaarxiv icon

Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks

Nov 29, 2022
Carlos Martin, Tuomas Sandholm

Figure 1 for Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks
Figure 2 for Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks
Figure 3 for Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks
Figure 4 for Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks
Viaarxiv icon

Near-Optimal $Φ$-Regret Learning in Extensive-Form Games

Aug 20, 2022
Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm

Figure 1 for Near-Optimal $Φ$-Regret Learning in Extensive-Form Games
Figure 2 for Near-Optimal $Φ$-Regret Learning in Extensive-Form Games
Viaarxiv icon