Picture for Michael Bowling

Michael Bowling

Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning

Add code
Jun 27, 2024
Viaarxiv icon

Beyond Optimism: Exploration With Partially Observable Rewards

Add code
Jun 20, 2024
Viaarxiv icon

Monitored Markov Decision Processes

Add code
Feb 13, 2024
Viaarxiv icon

Assessing the Interpretability of Programmatic Policies with Large Language Models

Add code
Nov 12, 2023
Viaarxiv icon

TacticAI: an AI assistant for football tactics

Add code
Oct 17, 2023
Figure 1 for TacticAI: an AI assistant for football tactics
Figure 2 for TacticAI: an AI assistant for football tactics
Figure 3 for TacticAI: an AI assistant for football tactics
Figure 4 for TacticAI: an AI assistant for football tactics
Viaarxiv icon

Proper Laplacian Representation Learning

Add code
Oct 16, 2023
Figure 1 for Proper Laplacian Representation Learning
Figure 2 for Proper Laplacian Representation Learning
Figure 3 for Proper Laplacian Representation Learning
Figure 4 for Proper Laplacian Representation Learning
Viaarxiv icon

Targeted Search Control in AlphaZero for Effective Policy Improvement

Add code
Feb 28, 2023
Figure 1 for Targeted Search Control in AlphaZero for Effective Policy Improvement
Figure 2 for Targeted Search Control in AlphaZero for Effective Policy Improvement
Figure 3 for Targeted Search Control in AlphaZero for Effective Policy Improvement
Figure 4 for Targeted Search Control in AlphaZero for Effective Policy Improvement
Viaarxiv icon

Settling the Reward Hypothesis

Add code
Dec 20, 2022
Figure 1 for Settling the Reward Hypothesis
Figure 2 for Settling the Reward Hypothesis
Viaarxiv icon

Over-communicate no more: Situated RL agents learn concise communication protocols

Add code
Nov 02, 2022
Figure 1 for Over-communicate no more: Situated RL agents learn concise communication protocols
Figure 2 for Over-communicate no more: Situated RL agents learn concise communication protocols
Figure 3 for Over-communicate no more: Situated RL agents learn concise communication protocols
Figure 4 for Over-communicate no more: Situated RL agents learn concise communication protocols
Viaarxiv icon

Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

Add code
Jun 04, 2022
Figure 1 for Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration
Viaarxiv icon