Alert button
Picture for Roy Fox

Roy Fox

Alert button

Anytime PSRO for Two-Player Zero-Sum Games

Jan 28, 2022
Stephen McAleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

Figure 1 for Anytime PSRO for Two-Player Zero-Sum Games
Figure 2 for Anytime PSRO for Two-Player Zero-Sum Games
Figure 3 for Anytime PSRO for Two-Player Zero-Sum Games
Figure 4 for Anytime PSRO for Two-Player Zero-Sum Games
Viaarxiv icon

Anytime Optimal PSRO for Two-Player Zero-Sum Games

Jan 19, 2022
Stephen McAleer, Kevin Wang, Marc Lanctot, John Lanier, Pierre Baldi, Roy Fox

Figure 1 for Anytime Optimal PSRO for Two-Player Zero-Sum Games
Figure 2 for Anytime Optimal PSRO for Two-Player Zero-Sum Games
Figure 3 for Anytime Optimal PSRO for Two-Player Zero-Sum Games
Figure 4 for Anytime Optimal PSRO for Two-Player Zero-Sum Games
Viaarxiv icon

Target Entropy Annealing for Discrete Soft Actor-Critic

Dec 06, 2021
Yaosheng Xu, Dailin Hu, Litian Liang, Stephen McAleer, Pieter Abbeel, Roy Fox

Figure 1 for Target Entropy Annealing for Discrete Soft Actor-Critic
Figure 2 for Target Entropy Annealing for Discrete Soft Actor-Critic
Figure 3 for Target Entropy Annealing for Discrete Soft Actor-Critic
Figure 4 for Target Entropy Annealing for Discrete Soft Actor-Critic
Viaarxiv icon

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

Nov 28, 2021
Dailin Hu, Pieter Abbeel, Roy Fox

Figure 1 for Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning
Figure 2 for Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning
Figure 3 for Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning
Figure 4 for Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning
Viaarxiv icon

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

Oct 28, 2021
Litian Liang, Yaosheng Xu, Stephen McAleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

Figure 1 for Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates
Figure 2 for Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates
Figure 3 for Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates
Figure 4 for Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates
Viaarxiv icon

Independent Natural Policy Gradient Always Converges in Markov Potential Games

Oct 20, 2021
Roy Fox, Stephen McAleer, Will Overman, Ioannis Panageas

Figure 1 for Independent Natural Policy Gradient Always Converges in Markov Potential Games
Figure 2 for Independent Natural Policy Gradient Always Converges in Markov Potential Games
Figure 3 for Independent Natural Policy Gradient Always Converges in Markov Potential Games
Figure 4 for Independent Natural Policy Gradient Always Converges in Markov Potential Games
Viaarxiv icon

Modular Framework for Visuomotor Language Grounding

Sep 05, 2021
Kolby Nottingham, Litian Liang, Daeyun Shin, Charless C. Fowlkes, Roy Fox, Sameer Singh

Figure 1 for Modular Framework for Visuomotor Language Grounding
Figure 2 for Modular Framework for Visuomotor Language Grounding
Figure 3 for Modular Framework for Visuomotor Language Grounding
Viaarxiv icon

Improving Social Welfare While Preserving Autonomy via a Pareto Mediator

Jun 07, 2021
Stephen McAleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox

Figure 1 for Improving Social Welfare While Preserving Autonomy via a Pareto Mediator
Figure 2 for Improving Social Welfare While Preserving Autonomy via a Pareto Mediator
Figure 3 for Improving Social Welfare While Preserving Autonomy via a Pareto Mediator
Figure 4 for Improving Social Welfare While Preserving Autonomy via a Pareto Mediator
Viaarxiv icon

XDO: A Double Oracle Algorithm for Extensive-Form Games

Mar 11, 2021
Stephen McAleer, John Lanier, Pierre Baldi, Roy Fox

Figure 1 for XDO: A Double Oracle Algorithm for Extensive-Form Games
Figure 2 for XDO: A Double Oracle Algorithm for Extensive-Form Games
Figure 3 for XDO: A Double Oracle Algorithm for Extensive-Form Games
Figure 4 for XDO: A Double Oracle Algorithm for Extensive-Form Games
Viaarxiv icon

A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

Feb 08, 2021
Forest Agostinelli, Alexander Shmakov, Stephen McAleer, Roy Fox, Pierre Baldi

Figure 1 for A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks
Figure 2 for A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks
Figure 3 for A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks
Figure 4 for A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks
Viaarxiv icon