Picture for Craig Boutilier

Craig Boutilier

University of Toronto

ConQUR: Mitigating Delusional Bias in Deep Q-learning

Add code
Feb 27, 2020
Figure 1 for ConQUR: Mitigating Delusional Bias in Deep Q-learning
Figure 2 for ConQUR: Mitigating Delusional Bias in Deep Q-learning
Figure 3 for ConQUR: Mitigating Delusional Bias in Deep Q-learning
Figure 4 for ConQUR: Mitigating Delusional Bias in Deep Q-learning
Viaarxiv icon

Differentiable Bandit Exploration

Add code
Feb 17, 2020
Figure 1 for Differentiable Bandit Exploration
Figure 2 for Differentiable Bandit Exploration
Figure 3 for Differentiable Bandit Exploration
Viaarxiv icon

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Add code
Feb 12, 2020
Figure 1 for Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Figure 2 for Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Figure 3 for Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Figure 4 for Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Viaarxiv icon

BRPO: Batch Residual Policy Optimization

Add code
Feb 08, 2020
Figure 1 for BRPO: Batch Residual Policy Optimization
Figure 2 for BRPO: Batch Residual Policy Optimization
Figure 3 for BRPO: Batch Residual Policy Optimization
Figure 4 for BRPO: Batch Residual Policy Optimization
Viaarxiv icon

Gradient-based Optimization for Bayesian Preference Elicitation

Add code
Nov 20, 2019
Figure 1 for Gradient-based Optimization for Bayesian Preference Elicitation
Figure 2 for Gradient-based Optimization for Bayesian Preference Elicitation
Figure 3 for Gradient-based Optimization for Bayesian Preference Elicitation
Figure 4 for Gradient-based Optimization for Bayesian Preference Elicitation
Viaarxiv icon

CAQL: Continuous Action Q-Learning

Add code
Oct 09, 2019
Figure 1 for CAQL: Continuous Action Q-Learning
Figure 2 for CAQL: Continuous Action Q-Learning
Figure 3 for CAQL: Continuous Action Q-Learning
Figure 4 for CAQL: Continuous Action Q-Learning
Viaarxiv icon

RecSim: A Configurable Simulation Platform for Recommender Systems

Add code
Sep 26, 2019
Figure 1 for RecSim: A Configurable Simulation Platform for Recommender Systems
Figure 2 for RecSim: A Configurable Simulation Platform for Recommender Systems
Viaarxiv icon

Randomized Exploration in Generalized Linear Bandits

Add code
Jun 21, 2019
Figure 1 for Randomized Exploration in Generalized Linear Bandits
Figure 2 for Randomized Exploration in Generalized Linear Bandits
Viaarxiv icon

Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

Add code
May 31, 2019
Figure 1 for Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Figure 2 for Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Figure 3 for Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Viaarxiv icon

Advantage Amplification in Slowly Evolving Latent-State Environments

Add code
May 29, 2019
Figure 1 for Advantage Amplification in Slowly Evolving Latent-State Environments
Figure 2 for Advantage Amplification in Slowly Evolving Latent-State Environments
Figure 3 for Advantage Amplification in Slowly Evolving Latent-State Environments
Viaarxiv icon