Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vibhavari Dasagi

Ctrl-Z: Recovering from Instability in Reinforcement Learning

Oct 09, 2019

Vibhavari Dasagi, Jake Bruce, Thierry Peynot, Jürgen Leitner

Figure 1 for Ctrl-Z: Recovering from Instability in Reinforcement Learning

Figure 2 for Ctrl-Z: Recovering from Instability in Reinforcement Learning

Figure 3 for Ctrl-Z: Recovering from Instability in Reinforcement Learning

Figure 4 for Ctrl-Z: Recovering from Instability in Reinforcement Learning

Abstract:When learning behavior, training data is often generated by the learner itself; this can result in unstable training dynamics, and this problem has particularly important applications in safety-sensitive real-world control tasks such as robotics. In this work, we propose a principled and model-agnostic approach to mitigate the issue of unstable learning dynamics by maintaining a history of a reinforcement learning agent over the course of training, and reverting to the parameters of a previous agent whenever performance significantly decreases. We develop techniques for evaluating this performance through statistical hypothesis testing of continued improvement, and evaluate them on a standard suite of challenging benchmark tasks involving continuous control of simulated robots. We show improvements over state-of-the-art reinforcement learning algorithms in performance and robustness to hyperparameters, outperforming DDPG in 5 out of 6 evaluation environments and showing no decrease in performance with TD3, which is known to be relatively stable. In this way, our approach takes an important step towards increasing data efficiency and stability in training for real-world robotic applications.

* Submitted to ICRA2020, under review

Via

Access Paper or Ask Questions

Zero-shot Sim-to-Real Transfer with Modular Priors

Sep 20, 2018

Robert Lee, Serena Mou, Vibhavari Dasagi, Jake Bruce, Jürgen Leitner, Niko Sünderhauf

Figure 1 for Zero-shot Sim-to-Real Transfer with Modular Priors

Figure 2 for Zero-shot Sim-to-Real Transfer with Modular Priors

Figure 3 for Zero-shot Sim-to-Real Transfer with Modular Priors

Figure 4 for Zero-shot Sim-to-Real Transfer with Modular Priors

Abstract:Current end-to-end Reinforcement Learning (RL) approaches are severely limited by restrictively large search spaces and are prone to overfitting to their training environment. This is because in end-to-end RL perception, decision-making and low-level control are all being learned jointly from very sparse reward signals, with little capability of incorporating prior knowledge or existing algorithms. In this work, we propose a novel framework that effectively decouples RL for high-level decision making from low-level perception and control. This allows us to transfer a learned policy from a highly abstract simulation to a real robot without requiring any transfer learning. We therefore coin our approach zero-shot sim-to-real transfer. We successfully demonstrate our approach on the robot manipulation task of object sorting. A key component of our approach is a deep sets encoder that enables us to reinforcement learn the high-level policy based on the variable-length output of a pre-trained object detector, instead of learning from raw pixels. We show that this method can learn effective policies within mere minutes of highly simplified simulation. The learned policies can be directly deployed on a robot without further training, and generalize to variations of the task unseen during training.

* Submitted to ICRA 2019, under review

Via

Access Paper or Ask Questions