Alert button
Picture for Craig Boutilier

Craig Boutilier

Alert button

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Add code
Bookmark button
Alert button
Feb 12, 2020
Ge Liu, Rui Wu, Heng-Tze Cheng, Jing Wang, Jayden Ooi, Lihong Li, Ang Li, Wai Lok Sibon Li, Craig Boutilier, Ed Chi

Figure 1 for Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Figure 2 for Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Figure 3 for Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Figure 4 for Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Viaarxiv icon

BRPO: Batch Residual Policy Optimization

Add code
Bookmark button
Alert button
Feb 08, 2020
Sungryull Sohn, Yinlam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, Craig Boutilier

Figure 1 for BRPO: Batch Residual Policy Optimization
Figure 2 for BRPO: Batch Residual Policy Optimization
Figure 3 for BRPO: Batch Residual Policy Optimization
Figure 4 for BRPO: Batch Residual Policy Optimization
Viaarxiv icon

Gradient-based Optimization for Bayesian Preference Elicitation

Add code
Bookmark button
Alert button
Nov 20, 2019
Ivan Vendrov, Tyler Lu, Qingqing Huang, Craig Boutilier

Figure 1 for Gradient-based Optimization for Bayesian Preference Elicitation
Figure 2 for Gradient-based Optimization for Bayesian Preference Elicitation
Figure 3 for Gradient-based Optimization for Bayesian Preference Elicitation
Figure 4 for Gradient-based Optimization for Bayesian Preference Elicitation
Viaarxiv icon

CAQL: Continuous Action Q-Learning

Add code
Bookmark button
Alert button
Oct 09, 2019
Moonkyung Ryu, Yinlam Chow, Ross Anderson, Christian Tjandraatmadja, Craig Boutilier

Figure 1 for CAQL: Continuous Action Q-Learning
Figure 2 for CAQL: Continuous Action Q-Learning
Figure 3 for CAQL: Continuous Action Q-Learning
Figure 4 for CAQL: Continuous Action Q-Learning
Viaarxiv icon

RecSim: A Configurable Simulation Platform for Recommender Systems

Add code
Bookmark button
Alert button
Sep 26, 2019
Eugene Ie, Chih-wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, Craig Boutilier

Figure 1 for RecSim: A Configurable Simulation Platform for Recommender Systems
Figure 2 for RecSim: A Configurable Simulation Platform for Recommender Systems
Viaarxiv icon

Randomized Exploration in Generalized Linear Bandits

Add code
Bookmark button
Alert button
Jun 21, 2019
Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

Figure 1 for Randomized Exploration in Generalized Linear Bandits
Figure 2 for Randomized Exploration in Generalized Linear Bandits
Viaarxiv icon

Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

Add code
Bookmark button
Alert button
May 31, 2019
Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, Jim McFadden, Tushar Chandra, Craig Boutilier

Figure 1 for Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Figure 2 for Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Figure 3 for Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Viaarxiv icon

Advantage Amplification in Slowly Evolving Latent-State Environments

Add code
Bookmark button
Alert button
May 29, 2019
Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier

Figure 1 for Advantage Amplification in Slowly Evolving Latent-State Environments
Figure 2 for Advantage Amplification in Slowly Evolving Latent-State Environments
Figure 3 for Advantage Amplification in Slowly Evolving Latent-State Environments
Viaarxiv icon