Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pieter Abbeel

UC Berkeley

Task-Agnostic Morphology Evolution

Feb 25, 2021

Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto

Figure 1 for Task-Agnostic Morphology Evolution

Figure 2 for Task-Agnostic Morphology Evolution

Figure 3 for Task-Agnostic Morphology Evolution

Figure 4 for Task-Agnostic Morphology Evolution

Abstract:Deep reinforcement learning primarily focuses on learning behavior, usually overlooking the fact that an agent's function is largely determined by form. So, how should one go about finding a morphology fit for solving tasks in a given environment? Current approaches that co-adapt morphology and behavior use a specific task's reward as a signal for morphology optimization. However, this often requires expensive policy optimization and results in task-dependent morphologies that are not built to generalize. In this work, we propose a new approach, Task-Agnostic Morphology Evolution (TAME), to alleviate both of these issues. Without any task or reward specification, TAME evolves morphologies by only applying randomly sampled action primitives on a population of agents. This is accomplished using an information-theoretic objective that efficiently ranks agents by their ability to reach diverse states in the environment and the causality of their actions. Finally, we empirically demonstrate that across 2D, 3D, and manipulation environments TAME can evolve morphologies that match the multi-task performance of those learned with task supervised algorithms. Our code and videos can be found at https://sites.google.com/view/task-agnostic-evolution.

* ICLR 2021

Via

Access Paper or Ask Questions

State Entropy Maximization with Random Encoders for Efficient Exploration

Feb 18, 2021

Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee

Figure 1 for State Entropy Maximization with Random Encoders for Efficient Exploration

Figure 2 for State Entropy Maximization with Random Encoders for Efficient Exploration

Figure 3 for State Entropy Maximization with Random Encoders for Efficient Exploration

Figure 4 for State Entropy Maximization with Random Encoders for Efficient Exploration

Abstract:Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL). However, efficient exploration in high-dimensional observation spaces still remains a challenge. This paper presents Random Encoders for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward. In order to estimate state entropy in environments with high-dimensional observations, we utilize a k-nearest neighbor entropy estimator in the low-dimensional representation space of a convolutional encoder. In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly initialized encoder, which is fixed throughout training. Our experiments show that RE3 significantly improves the sample-efficiency of both model-free and model-based RL methods on locomotion and navigation tasks from DeepMind Control Suite and MiniGrid benchmarks. We also show that RE3 allows learning diverse behaviors without extrinsic rewards, effectively improving sample-efficiency in downstream tasks. Source code and videos are available at https://sites.google.com/view/re3-rl.

* First two authors contributed equally, website: https://sites.google.com/view/re3-rl

Via

Access Paper or Ask Questions

Bottleneck Transformers for Visual Recognition

Jan 27, 2021

Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

Figure 1 for Bottleneck Transformers for Visual Recognition

Figure 2 for Bottleneck Transformers for Visual Recognition

Figure 3 for Bottleneck Transformers for Visual Recognition

Figure 4 for Bottleneck Transformers for Visual Recognition

Abstract:We present BoTNet, a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency. Through the design of BoTNet, we also point out how ResNet bottleneck blocks with self-attention can be viewed as Transformer blocks. Without any bells and whistles, BoTNet achieves 44.4% Mask AP and 49.7% Box AP on the COCO Instance Segmentation benchmark using the Mask R-CNN framework; surpassing the previous best published single model and single scale results of ResNeSt evaluated on the COCO validation set. Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84.7% top-1 accuracy on the ImageNet benchmark while being up to 2.33x faster in compute time than the popular EfficientNet models on TPU-v3 hardware. We hope our simple and effective approach will serve as a strong baseline for future research in self-attention models for vision.

* Technical Report, 20 pages, 13 figures, 19 tables

Via

Access Paper or Ask Questions

Reinforcement Learning with Latent Flow

Jan 06, 2021

Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin

Figure 1 for Reinforcement Learning with Latent Flow

Figure 2 for Reinforcement Learning with Latent Flow

Figure 3 for Reinforcement Learning with Latent Flow

Figure 4 for Reinforcement Learning with Latent Flow

Abstract:Temporal information is essential to learning effective policies with Reinforcement Learning (RL). However, current state-of-the-art RL algorithms either assume that such information is given as part of the state space or, when learning from pixels, use the simple heuristic of frame-stacking to implicitly capture temporal information present in the image observations. This heuristic is in contrast to the current paradigm in video classification architectures, which utilize explicit encodings of temporal information through methods such as optical flow and two-stream architectures to achieve state-of-the-art performance. Inspired by leading video classification architectures, we introduce the Flow of Latents for Reinforcement Learning (Flare), a network architecture for RL that explicitly encodes temporal information through latent vector differences. We show that Flare (i) recovers optimal performance in state-based RL without explicit access to the state velocity, solely with positional state information, (ii) achieves state-of-the-art performance on pixel-based challenging continuous control tasks within the DeepMind control benchmark suite, namely quadruped walk, hopper hop, finger turn hard, pendulum swing, and walker run, and is the most sample efficient model-free pixel-based RL algorithm, outperforming the prior model-free state-of-the-art by 1.9X and 1.5X on the 500k and 1M step benchmarks, respectively, and (iv), when augmented over rainbow DQN, outperforms this state-of-the-art level baseline on 5 of 8 challenging Atari games at 100M time step benchmark.

Via

Access Paper or Ask Questions

Reset-Free Lifelong Learning with Skill-Space Planning

Jan 01, 2021

Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

Figure 1 for Reset-Free Lifelong Learning with Skill-Space Planning

Figure 2 for Reset-Free Lifelong Learning with Skill-Space Planning

Figure 3 for Reset-Free Lifelong Learning with Skill-Space Planning

Figure 4 for Reset-Free Lifelong Learning with Skill-Space Planning

Abstract:The objective of lifelong reinforcement learning (RL) is to optimize agents which can continuously adapt and interact in changing environments. However, current RL approaches fail drastically when environments are non-stationary and interactions are non-episodic. We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills. We learn the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. Moreover, our framework permits skill discovery even from offline data, thereby reducing the need for excessive real-world interactions. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments derived from gridworld and MuJoCo benchmarks.

* Website link: https://sites.google.com/berkeley.edu/reset-free-lifelong-learning

Via

Access Paper or Ask Questions

A Framework for Efficient Robotic Manipulation

Dec 14, 2020

Albert Zhan, Philip Zhao, Lerrel Pinto, Pieter Abbeel, Michael Laskin

Figure 1 for A Framework for Efficient Robotic Manipulation

Figure 2 for A Framework for Efficient Robotic Manipulation

Figure 3 for A Framework for Efficient Robotic Manipulation

Figure 4 for A Framework for Efficient Robotic Manipulation

Abstract:Data-efficient learning of manipulation policies from visual observations is an outstanding challenge for real-robot learning. While deep reinforcement learning (RL) algorithms have shown success learning policies from visual observations, they still require an impractical number of real-world data samples to learn effective policies. However, recent advances in unsupervised representation learning and data augmentation significantly improved the sample efficiency of training RL policies on common simulated benchmarks. Building on these advances, we present a Framework for Efficient Robotic Manipulation (FERM) that utilizes data augmentation and unsupervised learning to achieve extremely sample-efficient training of robotic manipulation policies with sparse rewards. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels, such as reaching, picking, moving, pulling a large object, flipping a switch, and opening a drawer in just 15-50 minutes of real-world training time. We include videos, code, and additional information on the project website -- https://sites.google.com/view/efficient-robotic-manipulation.

Via

Access Paper or Ask Questions

Parallel Training of Deep Networks with Local Updates

Dec 07, 2020

Michael Laskin, Luke Metz, Seth Nabarrao, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel

Figure 1 for Parallel Training of Deep Networks with Local Updates

Figure 2 for Parallel Training of Deep Networks with Local Updates

Figure 3 for Parallel Training of Deep Networks with Local Updates

Figure 4 for Parallel Training of Deep Networks with Local Updates

Abstract:Deep learning models trained on large data sets have been widely successful in both vision and language domains. As state-of-the-art deep learning architectures have continued to grow in parameter count so have the compute budgets and times required to train them, increasing the need for compute-efficient methods that parallelize training. Two common approaches to parallelize the training of deep networks have been data and model parallelism. While useful, data and model parallelism suffer from diminishing returns in terms of compute efficiency for large batch sizes. In this paper, we investigate how to continue scaling compute efficiently beyond the point of diminishing returns for large batches through local parallelism, a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation. Local parallelism enables fully asynchronous layer-wise parallelism with a low memory footprint, and requires little communication overhead compared with model parallelism. We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.

* First two authors - Michael Laskin and Luke Metz - contributed equally. Order was determined by a coin flip

Via

Access Paper or Ask Questions

Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

Oct 26, 2020

Younggyo Seo, Kimin Lee, Ignasi Clavera, Thanard Kurutach, Jinwoo Shin, Pieter Abbeel

Figure 1 for Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

Figure 2 for Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

Figure 3 for Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

Figure 4 for Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

Abstract:Model-based reinforcement learning (RL) has shown great potential in various control tasks in terms of both sample-efficiency and final performance. However, learning a generalizable dynamics model robust to changes in dynamics remains a challenge since the target transition dynamics follow a multi-modal distribution. In this paper, we present a new model-based RL algorithm, coined trajectory-wise multiple choice learning, that learns a multi-headed dynamics model for dynamics generalization. The main idea is updating the most accurate prediction head to specialize each head in certain environments with similar dynamics, i.e., clustering environments. Moreover, we incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector, enabling the model to perform online adaptation to unseen environments. Finally, to utilize the specialized prediction heads more effectively, we propose an adaptive planning method, which selects the most accurate prediction head over a recent experience. Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods. Source code and videos are available at https://sites.google.com/view/trajectory-mcl.

* Accepted in NeurIPS2020. First two authors contributed equally, website: https://sites.google.com/view/trajectory-mcl code: https://github.com/younggyoseo/trajectory_mcl

Via

Access Paper or Ask Questions

LaND: Learning to Navigate from Disengagements

Oct 09, 2020

Gregory Kahn, Pieter Abbeel, Sergey Levine

Figure 1 for LaND: Learning to Navigate from Disengagements

Figure 2 for LaND: Learning to Navigate from Disengagements

Figure 3 for LaND: Learning to Navigate from Disengagements

Figure 4 for LaND: Learning to Navigate from Disengagements

Abstract:Consistently testing autonomous mobile robots in real world scenarios is a necessary aspect of developing autonomous navigation systems. Each time the human safety monitor disengages the robot's autonomy system due to the robot performing an undesirable maneuver, the autonomy developers gain insight into how to improve the autonomy system. However, we believe that these disengagements not only show where the system fails, which is useful for troubleshooting, but also provide a direct learning signal by which the robot can learn to navigate. We present a reinforcement learning approach for learning to navigate from disengagements, or LaND. LaND learns a neural network model that predicts which actions lead to disengagements given the current sensory observation, and then at test time plans and executes actions that avoid disengagements. Our results demonstrate LaND can successfully learn to navigate in diverse, real world sidewalk environments, outperforming both imitation learning and reinforcement learning approaches. Videos, code, and other material are available on our website https://sites.google.com/view/sidewalk-learning

Via

Access Paper or Ask Questions

Decoupling Representation Learning from Reinforcement Learning

Sep 30, 2020

Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin

Figure 1 for Decoupling Representation Learning from Reinforcement Learning

Figure 2 for Decoupling Representation Learning from Reinforcement Learning

Figure 3 for Decoupling Representation Learning from Reinforcement Learning

Figure 4 for Decoupling Representation Learning from Reinforcement Learning

Abstract:In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. To this end, we introduce a new unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL in most environments. Additionally, we benchmark several leading UL algorithms by pre-training encoders on expert demonstrations and using them, with weights frozen, in RL agents; we find that agents using ATC-trained encoders outperform all others. We also train multi-task encoders on data from multiple environments and show generalization to different downstream RL tasks. Finally, we ablate components of ATC, and introduce a new data augmentation to enable replay of (compressed) latent images from pre-trained encoders when RL requires augmentation. Our experiments span visually diverse RL benchmarks in DeepMind Control, DeepMind Lab, and Atari, and our complete code is available at https://github.com/astooke/rlpyt/tree/master/rlpyt/ul.

* Improved related works and fixed code hyperlink

Via

Access Paper or Ask Questions