Picture for Tom Zahavy

Tom Zahavy

Bootstrapped Meta-Learning

Add code
Sep 09, 2021
Figure 1 for Bootstrapped Meta-Learning
Figure 2 for Bootstrapped Meta-Learning
Figure 3 for Bootstrapped Meta-Learning
Figure 4 for Bootstrapped Meta-Learning
Viaarxiv icon

Emphatic Algorithms for Deep Reinforcement Learning

Add code
Jun 21, 2021
Figure 1 for Emphatic Algorithms for Deep Reinforcement Learning
Figure 2 for Emphatic Algorithms for Deep Reinforcement Learning
Figure 3 for Emphatic Algorithms for Deep Reinforcement Learning
Figure 4 for Emphatic Algorithms for Deep Reinforcement Learning
Viaarxiv icon

Discovering Diverse Nearly Optimal Policies withSuccessor Features

Add code
Jun 01, 2021
Figure 1 for Discovering Diverse Nearly Optimal Policies withSuccessor Features
Figure 2 for Discovering Diverse Nearly Optimal Policies withSuccessor Features
Figure 3 for Discovering Diverse Nearly Optimal Policies withSuccessor Features
Figure 4 for Discovering Diverse Nearly Optimal Policies withSuccessor Features
Viaarxiv icon

Reward is enough for convex MDPs

Add code
Jun 01, 2021
Figure 1 for Reward is enough for convex MDPs
Figure 2 for Reward is enough for convex MDPs
Viaarxiv icon

Online Apprenticeship Learning

Add code
Feb 13, 2021
Figure 1 for Online Apprenticeship Learning
Figure 2 for Online Apprenticeship Learning
Figure 3 for Online Apprenticeship Learning
Figure 4 for Online Apprenticeship Learning
Viaarxiv icon

Discovery of Options via Meta-Learned Subgoals

Add code
Feb 12, 2021
Figure 1 for Discovery of Options via Meta-Learned Subgoals
Figure 2 for Discovery of Options via Meta-Learned Subgoals
Figure 3 for Discovery of Options via Meta-Learned Subgoals
Figure 4 for Discovery of Options via Meta-Learned Subgoals
Viaarxiv icon

Discovering a set of policies for the worst case reward

Add code
Feb 08, 2021
Figure 1 for Discovering a set of policies for the worst case reward
Figure 2 for Discovering a set of policies for the worst case reward
Figure 3 for Discovering a set of policies for the worst case reward
Figure 4 for Discovering a set of policies for the worst case reward
Viaarxiv icon

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

Add code
Feb 07, 2021
Figure 1 for Online Limited Memory Neural-Linear Bandits with Likelihood Matching
Figure 2 for Online Limited Memory Neural-Linear Bandits with Likelihood Matching
Figure 3 for Online Limited Memory Neural-Linear Bandits with Likelihood Matching
Figure 4 for Online Limited Memory Neural-Linear Bandits with Likelihood Matching
Viaarxiv icon

Balancing Constraints and Rewards with Meta-Gradient D4PG

Add code
Oct 13, 2020
Figure 1 for Balancing Constraints and Rewards with Meta-Gradient D4PG
Figure 2 for Balancing Constraints and Rewards with Meta-Gradient D4PG
Figure 3 for Balancing Constraints and Rewards with Meta-Gradient D4PG
Figure 4 for Balancing Constraints and Rewards with Meta-Gradient D4PG
Viaarxiv icon

Learning to Ask Medical Questions using Reinforcement Learning

Add code
Mar 31, 2020
Figure 1 for Learning to Ask Medical Questions using Reinforcement Learning
Figure 2 for Learning to Ask Medical Questions using Reinforcement Learning
Figure 3 for Learning to Ask Medical Questions using Reinforcement Learning
Figure 4 for Learning to Ask Medical Questions using Reinforcement Learning
Viaarxiv icon