Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pieter Abbeel

UC Berkeley

Automatic Goal Generation for Reinforcement Learning Agents

Jul 23, 2018

Carlos Florensa, David Held, Xinyang Geng, Pieter Abbeel

Figure 1 for Automatic Goal Generation for Reinforcement Learning Agents

Figure 2 for Automatic Goal Generation for Reinforcement Learning Agents

Figure 3 for Automatic Goal Generation for Reinforcement Learning Agents

Figure 4 for Automatic Goal Generation for Reinforcement Learning Agents

Abstract:Reinforcement learning is a powerful technique to train an agent to perform a task. However, an agent that is trained using reinforcement learning is only capable of achieving the single task that is specified via its reward function. Such an approach does not scale well to settings in which an agent needs to perform a diverse set of tasks, such as navigating to varying positions in a room or moving objects to varying locations. Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing. We use a generator network to propose tasks for the agent to try to achieve, specified as goal states. The generator network is optimized using adversarial training to produce tasks that are always at the appropriate level of difficulty for the agent. Our method thus automatically produces a curriculum of tasks for the agent to learn. We show that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment. Our method can also learn to achieve tasks with sparse rewards, which traditionally pose significant challenges.

* Accepted at ICML 2018, Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018

Via

Access Paper or Ask Questions

Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Jun 18, 2018

YuXuan Liu, Abhishek Gupta, Pieter Abbeel, Sergey Levine

Figure 1 for Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Figure 2 for Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Figure 3 for Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Figure 4 for Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Abstract:Imitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. However, standard imitation learning methods assume that the agent receives examples of observation-action tuples that could be provided, for instance, to a supervised learning algorithm. This stands in contrast to how humans and animals imitate: we observe another person performing some behavior and then figure out which actions will realize that behavior, compensating for changes in viewpoint, surroundings, object positions and types, and other factors. We term this kind of imitation learning "imitation-from-observation," and propose an imitation learning method based on video prediction with context translation and deep reinforcement learning. This lifts the assumption in imitation learning that the demonstration should consist of observations in the same environment configuration, and enables a variety of interesting applications, including learning robotic skills that involve tool use simply by observing videos of human tool use. Our experimental results show the effectiveness of our approach in learning a wide range of real-world robotic tasks modeled after common household chores from videos of a human demonstrator, including sweeping, ladling almonds, pushing objects as well as a number of tasks in simulation.

* Accepted at ICRA 2018, Brisbane. YuXuan Liu and Abhishek Gupta had equal contribution

Via

Access Paper or Ask Questions

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Jun 07, 2018

John D. Co-Reyes, YuXuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, Sergey Levine

Figure 1 for Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Figure 2 for Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Figure 3 for Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Figure 4 for Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Abstract:In this work, we take a representation learning perspective on hierarchical reinforcement learning, where the problem of learning lower layers in a hierarchy is transformed into the problem of learning trajectory-level generative models. We show that we can learn continuous latent representations of trajectories, which are effective in solving temporally extended and multi-stage problems. Our proposed model, SeCTAR, draws inspiration from variational autoencoders, and learns latent representations of trajectories. A key component of this method is to learn both a latent-conditioned policy and a latent-conditioned model which are consistent with each other. Given the same latent, the policy generates a trajectory which should match the trajectory predicted by the model. This model provides a built-in prediction mechanism, by predicting the outcome of closed loop policy behavior. We propose a novel algorithm for performing hierarchical RL with this model, combining model-based planning in the learned latent space with an unsupervised exploration objective. We show that our model is effective at reasoning over long horizons with sparse rewards for several simulated tasks, outperforming standard reinforcement learning methods and prior methods for hierarchical reasoning, model-based planning, and exploration.

* Accepted at ICML 2018

Via

Access Paper or Ask Questions

Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

May 17, 2018

Gregory Kahn, Adam Villaflor, Bosen Ding, Pieter Abbeel, Sergey Levine

Figure 1 for Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Figure 2 for Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Figure 3 for Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Figure 4 for Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Abstract:Enabling robots to autonomously navigate complex environments is essential for real-world deployment. Prior methods approach this problem by having the robot maintain an internal map of the world, and then use a localization and planning method to navigate through the internal map. However, these approaches often include a variety of assumptions, are computationally intensive, and do not learn from failures. In contrast, learning-based methods improve as the robot acts in the environment, but are difficult to deploy in the real-world due to their high sample complexity. To address the need to learn complex policies with few samples, we propose a generalized computation graph that subsumes value-based model-free methods and model-based methods, with specific instantiations interpolating between model-free and model-based. We then instantiate this graph to form a navigation model that learns from raw images and is sample efficient. Our simulated car experiments explore the design decisions of our navigation model, and show our approach outperforms single-step and $N$-step double Q-learning. We also evaluate our approach on a real-world RC car and show it can learn to navigate through a complex indoor environment with a few hours of fully autonomous, self-supervised training. Videos of the experiments and code can be found at github.com/gkahn13/gcg

* ICRA 2018

Via

Access Paper or Ask Questions

Evolved Policy Gradients

Apr 29, 2018

Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, Pieter Abbeel

Abstract:We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning. Empirical results show that our evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. We also demonstrate that EPG's learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.

Via

Access Paper or Ask Questions

The Limits and Potentials of Deep Learning for Robotics

Apr 18, 2018

Niko Sünderhauf, Oliver Brock, Walter Scheirer, Raia Hadsell, Dieter Fox, Jürgen Leitner, Ben Upcroft, Pieter Abbeel, Wolfram Burgard, Michael Milford(+1 more)

Figure 1 for The Limits and Potentials of Deep Learning for Robotics

Figure 2 for The Limits and Potentials of Deep Learning for Robotics

Figure 3 for The Limits and Potentials of Deep Learning for Robotics

Figure 4 for The Limits and Potentials of Deep Learning for Robotics

Abstract:The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique challenges for deep robotic learning in simulation, and explore the spectrum between purely data-driven and model-driven approaches. We hope this paper provides a motivating overview of important research directions to overcome the current limitations, and help fulfill the promising potentials of deep learning in robotics.

Via

Access Paper or Ask Questions

Stochastic Adversarial Video Prediction

Apr 04, 2018

Alex X. Lee, Richard Zhang, Frederik Ebert, Pieter Abbeel, Chelsea Finn, Sergey Levine

Figure 1 for Stochastic Adversarial Video Prediction

Figure 2 for Stochastic Adversarial Video Prediction

Figure 3 for Stochastic Adversarial Video Prediction

Figure 4 for Stochastic Adversarial Video Prediction

Abstract:Being able to predict what may happen in the future requires an in-depth understanding of the physical and causal rules that govern the world. A model that is able to do so has a number of appealing applications, from robotic planning to representation learning. However, learning to predict raw future observations, such as frames in a video, is exceedingly challenging -- the ambiguous nature of the problem can cause a naively designed model to average together possible futures into a single, blurry prediction. Recently, this has been addressed by two distinct approaches: (a) latent variational variable models that explicitly model underlying stochasticity and (b) adversarially-trained models that aim to produce naturalistic images. However, a standard latent variable model can struggle to produce realistic results, and a standard adversarially-trained model underutilizes latent variables and fails to produce diverse predictions. We show that these distinct methods are in fact complementary. Combining the two produces predictions that look more realistic to human raters and better cover the range of possible futures. Our method outperforms prior and concurrent work in these aspects.

* Website: https://alexlee-gk.github.io/video_prediction/

Via

Access Paper or Ask Questions

Universal Planning Networks

Apr 04, 2018

Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, Chelsea Finn

Figure 1 for Universal Planning Networks

Figure 2 for Universal Planning Networks

Figure 3 for Universal Planning Networks

Figure 4 for Universal Planning Networks

Abstract:A key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization. To this end, we introduce universal planning networks (UPN). UPNs embed differentiable planning within a goal-directed policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradient descent trajectory optimization. The plan-by-gradient-descent process and its underlying representations are learned end-to-end to directly optimize a supervised imitation learning objective. We find that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images. The learned representations can be leveraged to specify distance-based rewards to reach new target states for model-free reinforcement learning, resulting in substantially more effective learning when solving new tasks described via image-based goals. We were able to achieve successful transfer of visuomotor planning strategies across robots with significantly different morphologies and actuation capabilities.

* Videos available at https://sites.google.com/view/upn-public/home

Via

Access Paper or Ask Questions

Domain Randomization and Generative Models for Robotic Grasping

Apr 03, 2018

Joshua Tobin, Lukas Biewald, Rocky Duan, Marcin Andrychowicz, Ankur Handa, Vikash Kumar, Bob McGrew, Jonas Schneider, Peter Welinder, Wojciech Zaremba(+1 more)

Figure 1 for Domain Randomization and Generative Models for Robotic Grasping

Figure 2 for Domain Randomization and Generative Models for Robotic Grasping

Figure 3 for Domain Randomization and Generative Models for Robotic Grasping

Figure 4 for Domain Randomization and Generative Models for Robotic Grasping

Abstract:Deep learning-based robotic grasping has made significant progress thanks to algorithmic improvements and increased data availability. However, state-of-the-art models are often trained on as few as hundreds or thousands of unique object instances, and as a result generalization can be a challenge. In this work, we explore a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis. We generate millions of unique, unrealistic procedurally generated objects, and train a deep neural network to perform grasp planning on these objects. Since the distribution of successful grasps for a given object can be highly multimodal, we propose an autoregressive grasp planning model that maps sensor inputs of a scene to a probability distribution over possible grasps. This model allows us to sample grasps efficiently at test time (or avoid sampling entirely). We evaluate our model architecture and data generation pipeline in simulation and the real world. We find we can achieve a $>$90% success rate on previously unseen realistic objects at test time in simulation despite having only been trained on random objects. We also demonstrate an 80% success rate on real-world grasp attempts despite having only been trained on random simulated objects.

* 8 pages, 11 figures. Submitted to 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018)

Via

Access Paper or Ask Questions

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Mar 20, 2018

Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, Pieter Abbeel

Figure 1 for Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Figure 2 for Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Figure 3 for Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Figure 4 for Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Abstract:Policy gradient methods have enjoyed great success in deep reinforcement learning but suffer from high variance of gradient estimates. The high variance problem is particularly exasperated in problems with long horizons or high-dimensional action spaces. To mitigate this issue, we derive a bias-free action-dependent baseline for variance reduction which fully exploits the structural form of the stochastic policy itself and does not make any additional assumptions about the MDP. We demonstrate and quantify the benefit of the action-dependent baseline through both theoretical analysis as well as numerical results, including an analysis of the suboptimality of the optimal state-dependent baseline. The result is a computationally efficient policy gradient algorithm, which scales to high-dimensional control problems, as demonstrated by a synthetic 2000-dimensional target matching task. Our experimental results indicate that action-dependent baselines allow for faster learning on standard reinforcement learning benchmarks and high-dimensional hand manipulation and synthetic tasks. Finally, we show that the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks.

* Accepted to ICLR 2018, Oral (2%)

Via

Access Paper or Ask Questions