Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Igor Mordatch

Implicit Behavioral Cloning

Sep 01, 2021

Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson

Figure 1 for Implicit Behavioral Cloning

Figure 2 for Implicit Behavioral Cloning

Figure 3 for Implicit Behavioral Cloning

Figure 4 for Implicit Behavioral Cloning

Abstract:We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.

Via

Access Paper or Ask Questions

Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

Jul 14, 2021

Joel Z. Leibo, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charles Beattie, Igor Mordatch, Thore Graepel

Figure 1 for Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

Figure 2 for Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

Figure 3 for Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

Figure 4 for Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

Abstract:Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap, and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agent's behavior constitutes (part of) another agent's environment. To demonstrate scalability, we have created over 80 unique test scenarios covering a broad range of research topics such as social dilemmas, reciprocity, resource sharing, and task partitioning. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.

* In International Conference on Machine Learning 2021 (pp. 6187-6199). PMLR
* Accepted to ICML 2021 and presented as a long talk; 33 pages; 9 figures

Via

Access Paper or Ask Questions

Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Jun 24, 2021

C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, Olivier Bachem

Figure 1 for Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Figure 2 for Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Figure 3 for Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Figure 4 for Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Abstract:We present Brax, an open source library for rigid body simulation with a focus on performance and parallelism on accelerators, written in JAX. We present results on a suite of tasks inspired by the existing reinforcement learning literature, but remade in our engine. Additionally, we provide reimplementations of PPO, SAC, ES, and direct policy optimization in JAX that compile alongside our environments, allowing the learning algorithm and the environment processing to occur on the same device, and to scale seamlessly on accelerators. Finally, we include notebooks that facilitate training of performant policies on common OpenAI Gym MuJoCo-like tasks in minutes.

* 9 pages + 12 pages of appendices and references. In submission at NeurIPS 2021 Datasets and Benchmarks Track

Via

Access Paper or Ask Questions

Model-Based Reinforcement Learning via Latent-Space Collocation

Jun 24, 2021

Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine

Figure 1 for Model-Based Reinforcement Learning via Latent-Space Collocation

Figure 2 for Model-Based Reinforcement Learning via Latent-Space Collocation

Figure 3 for Model-Based Reinforcement Learning via Latent-Space Collocation

Figure 4 for Model-Based Reinforcement Learning via Latent-Space Collocation

Abstract:The ability to plan into the future while utilizing only raw high-dimensional observations, such as images, can provide autonomous agents with broad capabilities. Visual model-based reinforcement learning (RL) methods that plan future actions directly have shown impressive results on tasks that require only short-horizon reasoning, however, these methods struggle on temporally extended tasks. We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions, as the effects of actions greatly compound over time and are harder to optimize. To achieve this, we draw on the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, and adapt it to the image-based setting by utilizing learned latent state space models. The resulting latent collocation method (LatCo) optimizes trajectories of latent states, which improves over previously proposed shooting methods for visual model-based RL on tasks with sparse rewards and long-term goals. Videos and code at https://orybkin.github.io/latco/.

* International Conference on Machine Learning (ICML), 2021. Videos and code at https://orybkin.github.io/latco/

Via

Access Paper or Ask Questions

Decision Transformer: Reinforcement Learning via Sequence Modeling

Jun 24, 2021

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

Figure 1 for Decision Transformer: Reinforcement Learning via Sequence Modeling

Figure 2 for Decision Transformer: Reinforcement Learning via Sequence Modeling

Figure 3 for Decision Transformer: Reinforcement Learning via Sequence Modeling

Figure 4 for Decision Transformer: Reinforcement Learning via Sequence Modeling

Abstract:We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

* First two authors contributed equally. Last two authors advised equally

Via

Access Paper or Ask Questions

Pretrained Transformers as Universal Computation Engines

Mar 09, 2021

Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

Figure 1 for Pretrained Transformers as Universal Computation Engines

Figure 2 for Pretrained Transformers as Universal Computation Engines

Figure 3 for Pretrained Transformers as Universal Computation Engines

Figure 4 for Pretrained Transformers as Universal Computation Engines

Abstract:We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning -- in particular, without finetuning of the self-attention and feedforward layers of the residual blocks. We consider such a model, which we call a Frozen Pretrained Transformer (FPT), and study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction. In contrast to prior works which investigate finetuning on the same modality as the pretraining dataset, we show that pretraining on natural language improves performance and compute efficiency on non-language downstream tasks. In particular, we find that such pretraining enables FPT to generalize in zero-shot to these modalities, matching the performance of a transformer fully trained on these tasks.

Via

Access Paper or Ask Questions

Reset-Free Lifelong Learning with Skill-Space Planning

Jan 01, 2021

Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

Figure 1 for Reset-Free Lifelong Learning with Skill-Space Planning

Figure 2 for Reset-Free Lifelong Learning with Skill-Space Planning

Figure 3 for Reset-Free Lifelong Learning with Skill-Space Planning

Figure 4 for Reset-Free Lifelong Learning with Skill-Space Planning

Abstract:The objective of lifelong reinforcement learning (RL) is to optimize agents which can continuously adapt and interact in changing environments. However, current RL approaches fail drastically when environments are non-stationary and interactions are non-episodic. We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills. We learn the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. Moreover, our framework permits skill discovery even from offline data, thereby reducing the need for excessive real-world interactions. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments derived from gridworld and MuJoCo benchmarks.

* Website link: https://sites.google.com/berkeley.edu/reset-free-lifelong-learning

Via

Access Paper or Ask Questions

Improved Contrastive Divergence Training of Energy Based Models

Dec 17, 2020

Yilun Du, Shuang Li, Joshua Tenenbaum, Igor Mordatch

Figure 1 for Improved Contrastive Divergence Training of Energy Based Models

Figure 2 for Improved Contrastive Divergence Training of Energy Based Models

Figure 3 for Improved Contrastive Divergence Training of Energy Based Models

Figure 4 for Improved Contrastive Divergence Training of Energy Based Models

Abstract:We propose several different techniques to improve contrastive divergence training of energy-based models (EBMs). We first show that a gradient term neglected in the popular contrastive divergence formulation is both tractable to estimate and is important to avoid training instabilities in previous models. We further highlight how data augmentation, multi-scale processing, and reservoir sampling can be used to improve model robustness and generation quality. Thirdly, we empirically evaluate stability of model architectures and show improved performance on a host of benchmarks and use cases, such as image generation, OOD detection, and compositional generation.

* Project webpage at https://energy-based-model.github.io/improved-contrastive-divergence

Via

Access Paper or Ask Questions

Energy-Based Models for Continual Learning

Nov 24, 2020

Shuang Li, Yilun Du, Gido M. van de Ven, Antonio Torralba, Igor Mordatch

Figure 1 for Energy-Based Models for Continual Learning

Figure 2 for Energy-Based Models for Continual Learning

Figure 3 for Energy-Based Models for Continual Learning

Figure 4 for Energy-Based Models for Continual Learning

Abstract:We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs have a natural way to support a dynamically-growing number of tasks or classes that causes less interference with previously learned information. We find that EBMs outperform the baseline methods by a large margin on several continual learning benchmarks. We also show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a class of models naturally inclined towards the continual learning regime.

Via

Access Paper or Ask Questions

Rearrangement: A Challenge for Embodied AI

Nov 03, 2020

Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi(+2 more)

Figure 1 for Rearrangement: A Challenge for Embodied AI

Figure 2 for Rearrangement: A Challenge for Embodied AI

Figure 3 for Rearrangement: A Challenge for Embodied AI

Figure 4 for Rearrangement: A Challenge for Embodied AI

Abstract:We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement. A standard task can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings. In the rearrangement task, the goal is to bring a given physical environment into a specified state. The goal state can be specified by object poses, by images, by a description in language, or by letting the agent experience the environment in the goal state. We characterize rearrangement scenarios along different axes and describe metrics for benchmarking rearrangement performance. To facilitate research and exploration, we present experimental testbeds of rearrangement scenarios in four different simulation environments. We anticipate that other datasets will be released and new simulation platforms will be built to support training of rearrangement agents and their deployment on physical systems.

* Authors are listed in alphabetical order

Via

Access Paper or Ask Questions