Picture for Shixiang Gu

Shixiang Gu

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

Add code
Oct 02, 2018
Figure 1 for Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Figure 2 for Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Figure 3 for Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Figure 4 for Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Viaarxiv icon

The Mirage of Action-Dependent Baselines in Reinforcement Learning

Add code
Apr 06, 2018
Figure 1 for The Mirage of Action-Dependent Baselines in Reinforcement Learning
Figure 2 for The Mirage of Action-Dependent Baselines in Reinforcement Learning
Figure 3 for The Mirage of Action-Dependent Baselines in Reinforcement Learning
Figure 4 for The Mirage of Action-Dependent Baselines in Reinforcement Learning
Viaarxiv icon

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Add code
Feb 25, 2018
Figure 1 for Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Figure 2 for Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Figure 3 for Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Figure 4 for Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Viaarxiv icon

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning

Add code
Nov 18, 2017
Figure 1 for Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
Figure 2 for Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
Figure 3 for Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
Figure 4 for Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
Viaarxiv icon

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Add code
Oct 16, 2017
Figure 1 for Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Figure 2 for Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Figure 3 for Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Figure 4 for Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Viaarxiv icon

Categorical Reparameterization with Gumbel-Softmax

Add code
Aug 05, 2017
Figure 1 for Categorical Reparameterization with Gumbel-Softmax
Figure 2 for Categorical Reparameterization with Gumbel-Softmax
Figure 3 for Categorical Reparameterization with Gumbel-Softmax
Figure 4 for Categorical Reparameterization with Gumbel-Softmax
Viaarxiv icon

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Add code
Jun 01, 2017
Figure 1 for Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
Figure 2 for Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
Figure 3 for Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
Figure 4 for Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
Viaarxiv icon

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

Add code
Feb 27, 2017
Figure 1 for Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Figure 2 for Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Figure 3 for Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Figure 4 for Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Viaarxiv icon

Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

Add code
Nov 23, 2016
Figure 1 for Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
Figure 2 for Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
Figure 3 for Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
Figure 4 for Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
Viaarxiv icon

Continuous Deep Q-Learning with Model-based Acceleration

Add code
Mar 02, 2016
Figure 1 for Continuous Deep Q-Learning with Model-based Acceleration
Figure 2 for Continuous Deep Q-Learning with Model-based Acceleration
Figure 3 for Continuous Deep Q-Learning with Model-based Acceleration
Figure 4 for Continuous Deep Q-Learning with Model-based Acceleration
Viaarxiv icon