Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lerrel Pinto

RB2: Robotic Manipulation Benchmarking with a Twist

Mar 15, 2022

Sudeep Dasari, Jianren Wang, Joyce Hong, Shikhar Bahl, Yixin Lin, Austin Wang, Abitha Thankaraj, Karanbir Chahal, Berk Calli, Saurabh Gupta(+5 more)

Figure 1 for RB2: Robotic Manipulation Benchmarking with a Twist

Figure 2 for RB2: Robotic Manipulation Benchmarking with a Twist

Figure 3 for RB2: Robotic Manipulation Benchmarking with a Twist

Figure 4 for RB2: Robotic Manipulation Benchmarking with a Twist

Abstract:Benchmarks offer a scientific way to compare algorithms using objective performance metrics. Good benchmarks have two features: (a) they should be widely useful for many research groups; (b) and they should produce reproducible findings. In robotic manipulation research, there is a trade-off between reproducibility and broad accessibility. If the benchmark is kept restrictive (fixed hardware, objects), the numbers are reproducible but the setup becomes less general. On the other hand, a benchmark could be a loose set of protocols (e.g. object sets) but the underlying variation in setups make the results non-reproducible. In this paper, we re-imagine benchmarking for robotic manipulation as state-of-the-art algorithmic implementations, alongside the usual set of tasks and experimental protocols. The added baseline implementations will provide a way to easily recreate SOTA numbers in a new local robotic setup, thus providing credible relative rankings between existing approaches and new work. However, these local rankings could vary between different setups. To resolve this issue, we build a mechanism for pooling experimental data between labs, and thus we establish a single global ranking for existing (and proposed) SOTA algorithms. Our benchmark, called Ranking-Based Robotics Benchmark (RB2), is evaluated on tasks that are inspired from clinically validated Southampton Hand Assessment Procedures. Our benchmark was run across two different labs and reveals several surprising findings. For example, extremely simple baselines like open-loop behavior cloning, outperform more complicated models (e.g. closed loop, RNN, Offline-RL, etc.) that are preferred by the field. We hope our fellow researchers will use RB2 to improve their research's quality and rigor.

* accepted at the NeurIPS 2021 Datasets and Benchmarks Track

Via

Access Paper or Ask Questions

Context is Everything: Implicit Identification for Dynamics Adaptation

Mar 10, 2022

Ben Evans, Abitha Thankaraj, Lerrel Pinto

Figure 1 for Context is Everything: Implicit Identification for Dynamics Adaptation

Figure 2 for Context is Everything: Implicit Identification for Dynamics Adaptation

Figure 3 for Context is Everything: Implicit Identification for Dynamics Adaptation

Figure 4 for Context is Everything: Implicit Identification for Dynamics Adaptation

Abstract:Understanding environment dynamics is necessary for robots to act safely and optimally in the world. In realistic scenarios, dynamics are non-stationary and the causal variables such as environment parameters cannot necessarily be precisely measured or inferred, even during training. We propose Implicit Identification for Dynamics Adaptation (IIDA), a simple method to allow predictive models to adapt to changing environment dynamics. IIDA assumes no access to the true variations in the world and instead implicitly infers properties of the environment from a small amount of contextual data. We demonstrate IIDA's ability to perform well in unseen environments through a suite of simulated experiments on MuJoCo environments and a real robot dynamic sliding task. In general, IIDA significantly reduces model error and results in higher task performance over commonly used methods. Our code and robot videos are at https://bennevans.github.io/iida/

* Accepted at ICRA 2022

Via

Access Paper or Ask Questions

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Feb 08, 2022

Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto

Figure 1 for Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Figure 2 for Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Figure 3 for Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Figure 4 for Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Abstract:Recent progress in deep learning has relied on access to large and diverse datasets. Such data-driven progress has been less evident in offline reinforcement learning (RL), because offline RL data is usually collected to optimize specific target tasks limiting the data's diversity. In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL. We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks. Our findings suggest that data generation is as important as algorithmic advances for offline RL and hence requires careful consideration from the community. Code and data can be found at https://github.com/denisyarats/exorl .

Via

Access Paper or Ask Questions

The Surprising Effectiveness of Representation Learning for Visual Imitation

Dec 06, 2021

Jyothish Pari, Nur Muhammad Shafiullah, Sridhar Pandian Arunachalam, Lerrel Pinto

Figure 1 for The Surprising Effectiveness of Representation Learning for Visual Imitation

Figure 2 for The Surprising Effectiveness of Representation Learning for Visual Imitation

Figure 3 for The Surprising Effectiveness of Representation Learning for Visual Imitation

Figure 4 for The Surprising Effectiveness of Representation Learning for Visual Imitation

Abstract:While visual imitation learning offers one of the most effective ways of learning from visual demonstrations, generalizing from them requires either hundreds of diverse demonstrations, task specific priors, or large, hard-to-train parametric models. One reason such complexities arise is because standard visual imitation frameworks try to solve two coupled problems at once: learning a succinct but good representation from the diverse visual data, while simultaneously learning to associate the demonstrated actions with such representations. Such joint learning causes an interdependence between these two problems, which often results in needing large amounts of demonstrations for learning. To address this challenge, we instead propose to decouple representation learning from behavior learning for visual imitation. First, we learn a visual representation encoder from offline data using standard supervised and self-supervised learning methods. Once the representations are trained, we use non-parametric Locally Weighted Regression to predict the actions. We experimentally show that this simple decoupling improves the performance of visual imitation models on both offline demonstration datasets and real-robot door opening compared to prior work in visual imitation. All of our generated data, code, and robot videos are publicly available at https://jyopari.github.io/VINN/.

* The first two authors contributed equally

Via

Access Paper or Ask Questions

URLB: Unsupervised Reinforcement Learning Benchmark

Oct 28, 2021

Michael Laskin, Denis Yarats, Hao Liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel

Figure 1 for URLB: Unsupervised Reinforcement Learning Benchmark

Figure 2 for URLB: Unsupervised Reinforcement Learning Benchmark

Figure 3 for URLB: Unsupervised Reinforcement Learning Benchmark

Figure 4 for URLB: Unsupervised Reinforcement Learning Benchmark

Abstract:Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we introduce the Unsupervised Reinforcement Learning Benchmark (URLB). URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. Building on the DeepMind Control Suite, we provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods. We find that the implemented baselines make progress but are not able to solve URLB and propose directions for future research.

* Code for the Unsupervised Reinforcement Learning Benchmark is available at https://github.com/rll-research/url_benchmark

Via

Access Paper or Ask Questions

Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

Jul 20, 2021

Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

Figure 1 for Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

Figure 2 for Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

Figure 3 for Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

Figure 4 for Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

Abstract:We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. DrQ-v2 builds on DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the DeepMind Control Suite. Notably, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL. DrQ-v2 is conceptually simple, easy to implement, and provides significantly better computational footprint compared to prior work, with the majority of tasks taking just 8 hours to train on a single GPU. Finally, we publicly release DrQ-v2's implementation to provide RL practitioners with a strong and computationally efficient baseline.

Via

Access Paper or Ask Questions

Playful Interactions for Representation Learning

Jul 19, 2021

Sarah Young, Jyothish Pari, Pieter Abbeel, Lerrel Pinto

Figure 1 for Playful Interactions for Representation Learning

Figure 2 for Playful Interactions for Representation Learning

Figure 3 for Playful Interactions for Representation Learning

Figure 4 for Playful Interactions for Representation Learning

Abstract:One of the key challenges in visual imitation learning is collecting large amounts of expert demonstrations for a given task. While methods for collecting human demonstrations are becoming easier with teleoperation methods and the use of low-cost assistive tools, we often still require 100-1000 demonstrations for every task to learn a visual representation and policy. To address this, we turn to an alternate form of data that does not require task-specific demonstrations -- play. Playing is a fundamental method children use to learn a set of skills and behaviors and visual representations in early learning. Importantly, play data is diverse, task-agnostic, and relatively cheap to obtain. In this work, we propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks. We collect 2 hours of playful data in 19 diverse environments and use self-predictive learning to extract visual representations. Given these representations, we train policies using imitation learning for two downstream tasks: Pushing and Stacking. We demonstrate that our visual representations generalize better than standard behavior cloning and can achieve similar performance with only half the number of required demonstrations. Our representations, which are trained from scratch, compare favorably against ImageNet pretrained representations. Finally, we provide an experimental analysis on the effects of different pretraining modes on downstream task learning.

Via

Access Paper or Ask Questions

GEM: Group Enhanced Model for Learning Dynamical Control Systems

Apr 07, 2021

Philippe Hansen-Estruch, Wenling Shang, Lerrel Pinto, Pieter Abbeel, Stas Tiomkin

Figure 1 for GEM: Group Enhanced Model for Learning Dynamical Control Systems

Figure 2 for GEM: Group Enhanced Model for Learning Dynamical Control Systems

Figure 3 for GEM: Group Enhanced Model for Learning Dynamical Control Systems

Figure 4 for GEM: Group Enhanced Model for Learning Dynamical Control Systems

Abstract:Learning the dynamics of a physical system wherein an autonomous agent operates is an important task. Often these systems present apparent geometric structures. For instance, the trajectories of a robotic manipulator can be broken down into a collection of its transitional and rotational motions, fully characterized by the corresponding Lie groups and Lie algebras. In this work, we take advantage of these structures to build effective dynamical models that are amenable to sample-based learning. We hypothesize that learning the dynamics on a Lie algebra vector space is more effective than learning a direct state transition model. To verify this hypothesis, we introduce the Group Enhanced Model (GEM). GEMs significantly outperform conventional transition models on tasks of long-term prediction, planning, and model-based reinforcement learning across a diverse suite of standard continuous-control environments, including Walker, Hopper, Reacher, Half-Cheetah, Inverted Pendulums, Ant, and Humanoid. Furthermore, plugging GEM into existing state of the art systems enhances their performance, which we demonstrate on the PETS system. This work sheds light on a connection between learning of dynamics and Lie group properties, which opens doors for new research directions and practical applications along this direction. Our code is publicly available at: https://tinyurl.com/GEMMBRL.

* 14 pages, 8 figures

Via

Access Paper or Ask Questions

Simultaneous Navigation and Construction Benchmarking Environments

Mar 31, 2021

Wenyu Han, Chen Feng, Haoran Wu, Alexander Gao, Armand Jordana, Dong Liu, Lerrel Pinto, Ludovic Righetti

Figure 1 for Simultaneous Navigation and Construction Benchmarking Environments

Figure 2 for Simultaneous Navigation and Construction Benchmarking Environments

Figure 3 for Simultaneous Navigation and Construction Benchmarking Environments

Figure 4 for Simultaneous Navigation and Construction Benchmarking Environments

Abstract:We need intelligent robots for mobile construction, the process of navigating in an environment and modifying its structure according to a geometric design. In this task, a major robot vision and learning challenge is how to exactly achieve the design without GPS, due to the difficulty caused by the bi-directional coupling of accurate robot localization and navigation together with strategic environment manipulation. However, many existing robot vision and learning tasks such as visual navigation and robot manipulation address only one of these two coupled aspects. To stimulate the pursuit of a generic and adaptive solution, we reasonably simplify mobile construction as a partially observable Markov decision process (POMDP) in 1/2/3D grid worlds and benchmark the performance of a handcrafted policy with basic localization and planning, and state-of-the-art deep reinforcement learning (RL) methods. Our extensive experiments show that the coupling makes this problem very challenging for those methods, and emphasize the need for novel task-specific solutions.

Via

Access Paper or Ask Questions

Task-Agnostic Morphology Evolution

Feb 25, 2021

Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto

Figure 1 for Task-Agnostic Morphology Evolution

Figure 2 for Task-Agnostic Morphology Evolution

Figure 3 for Task-Agnostic Morphology Evolution

Figure 4 for Task-Agnostic Morphology Evolution

Abstract:Deep reinforcement learning primarily focuses on learning behavior, usually overlooking the fact that an agent's function is largely determined by form. So, how should one go about finding a morphology fit for solving tasks in a given environment? Current approaches that co-adapt morphology and behavior use a specific task's reward as a signal for morphology optimization. However, this often requires expensive policy optimization and results in task-dependent morphologies that are not built to generalize. In this work, we propose a new approach, Task-Agnostic Morphology Evolution (TAME), to alleviate both of these issues. Without any task or reward specification, TAME evolves morphologies by only applying randomly sampled action primitives on a population of agents. This is accomplished using an information-theoretic objective that efficiently ranks agents by their ability to reach diverse states in the environment and the causality of their actions. Finally, we empirically demonstrate that across 2D, 3D, and manipulation environments TAME can evolve morphologies that match the multi-task performance of those learned with task supervised algorithms. Our code and videos can be found at https://sites.google.com/view/task-agnostic-evolution.

* ICLR 2021

Via

Access Paper or Ask Questions