Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alex Irpan

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Apr 04, 2022

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog(+33 more)

Figure 1 for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Figure 2 for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Figure 3 for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Figure 4 for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Abstract:Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website and the video can be found at https://say-can.github.io/

* See website at https://say-can.github.io/

Via

Access Paper or Ask Questions

BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

Feb 04, 2022

Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, Chelsea Finn

Figure 1 for BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

Figure 2 for BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

Figure 3 for BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

Figure 4 for BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

Abstract:In this paper, we study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks, a long-standing challenge in robot learning. We approach the challenge from an imitation learning perspective, aiming to study how scaling and broadening the data collected can facilitate such generalization. To that end, we develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions and can be conditioned on different forms of information that convey the task, including pre-trained embeddings of natural language or videos of humans performing the task. When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 24 unseen manipulation tasks with an average success rate of 44%, without any robot demonstrations for those tasks.

* Conference on Robot Learning (pp. 991-1002). 2022 Jan 11
* CoRL 2021, 23 pages

Via

Access Paper or Ask Questions

AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

Nov 11, 2021

Yao Lu, Karol Hausman, Yevgen Chebotar, Mengyuan Yan, Eric Jang, Alexander Herzog, Ted Xiao, Alex Irpan, Mohi Khansari, Dmitry Kalashnikov(+1 more)

Figure 1 for AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

Figure 2 for AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

Figure 3 for AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

Figure 4 for AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

Abstract:Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amountsof autonomously collected experience.Both methods have complementarystrengths and weaknesses: RL can reach a high level of performance, but requiresexploration, which can be very time consuming and unsafe; IL does not requireexploration, but only learns skills that are as good as the provided demonstrations.Can a single method combine the strengths of both approaches? A number ofprior methods have aimed to address this question, proposing a variety of tech-niques that integrate elements of IL and RL. However, scaling up such methodsto complex robotic skills that integrate diverse offline data and generalize mean-ingfully to real-world scenarios still presents a major challenge. In this paper, ouraim is to test the scalability of prior IL + RL algorithms and devise a system basedon detailed empirical experimentation that combines existing components in themost effective and scalable way. To that end, we present a series of experimentsaimed at understanding the implications of each design decision, so as to develop acombined approach that can utilize demonstrations and heterogeneous prior datato attain the best performance on a range of real-world and realistic simulatedrobotic problems. Our complete method, which we call AW-Opt, combines ele-ments of advantage-weighted regression [1, 2] and QT-Opt [3], providing a unifiedapproach for integrating demonstrations and offline data for robotic manipulation.Please see https://awopt.github.io for more details.

Via

Access Paper or Ask Questions

Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills

Apr 28, 2021

Yevgen Chebotar, Karol Hausman, Yao Lu, Ted Xiao, Dmitry Kalashnikov, Jake Varley, Alex Irpan, Benjamin Eysenbach, Ryan Julian, Chelsea Finn(+1 more)

Figure 1 for Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills

Figure 2 for Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills

Figure 3 for Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills

Figure 4 for Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills

Abstract:We consider the problem of learning useful robotic skills from previously collected offline data without access to manually specified rewards or additional online exploration, a setting that is becoming increasingly important for scaling robot learning by reusing past robotic data. In particular, we propose the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset. We employ goal-conditioned Q-learning with hindsight relabeling and develop several techniques that enable training in a particularly challenging offline setting. We find that our method can operate on high-dimensional camera images and learn a variety of skills on real robots that generalize to previously unseen scenes and objects. We also show that our method can learn to reach long-horizon goals across multiple episodes, and learn rich representations that can help with downstream tasks through pre-training or auxiliary objectives. The videos of our experiments can be found at https://actionable-models.github.io

Via

Access Paper or Ask Questions

Meta-Learning Requires Meta-Augmentation

Jul 10, 2020

Janarthanan Rajendran, Alex Irpan, Eric Jang

Figure 1 for Meta-Learning Requires Meta-Augmentation

Figure 2 for Meta-Learning Requires Meta-Augmentation

Figure 3 for Meta-Learning Requires Meta-Augmentation

Figure 4 for Meta-Learning Requires Meta-Augmentation

Abstract:Meta-learning algorithms aim to learn two components: a model that predicts targets for a task, and a base learner that quickly updates that model when given examples from a new task. This additional level of learning can be powerful, but it also creates another potential source for overfitting, since we can now overfit in either the model or the base learner. We describe both of these forms of metalearning overfitting, and demonstrate that they appear experimentally in common meta-learning benchmarks. We then use an information-theoretic framework to discuss meta-augmentation, a way to add randomness that discourages the base learner and model from learning trivial solutions that do not generalize to new tasks. We demonstrate that meta-augmentation produces large complementary benefits to recently proposed meta-regularization techniques.

* 14 pages, 8 figures. Code at https://github.com/google-research/google-research/tree/master/maml_nonexclusive

Via

Access Paper or Ask Questions

RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real

Jun 16, 2020

Kanishka Rao, Chris Harris, Alex Irpan, Sergey Levine, Julian Ibarz, Mohi Khansari

Figure 1 for RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real

Figure 2 for RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real

Figure 3 for RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real

Figure 4 for RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real

Abstract:Deep neural network based reinforcement learning (RL) can learn appropriate visual representations for complex tasks like vision-based robotic grasping without the need for manually engineering or prior learning a perception system. However, data for RL is collected via running an agent in the desired environment, and for applications like robotics, running a robot in the real world may be extremely costly and time consuming. Simulated training offers an appealing alternative, but ensuring that policies trained in simulation can transfer effectively into the real world requires additional machinery. Simulations may not match reality, and typically bridging the simulation-to-reality gap requires domain knowledge and task-specific engineering. We can automate this process by employing generative models to translate simulated images into realistic ones. However, this sort of translation is typically task-agnostic, in that the translated images may not preserve all features that are relevant to the task. In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. This allows us to learn a task-aware translation. Incorporating this loss into unsupervised domain translation, we obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning. In evaluations of RL-CycleGAN on two vision-based robotics grasping tasks, we show that RL-CycleGAN offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.

* Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020)

Via

Access Paper or Ask Questions

Off-Policy Evaluation via Off-Policy Classification

Jun 20, 2019

Alex Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz, Sergey Levine

Figure 1 for Off-Policy Evaluation via Off-Policy Classification

Figure 2 for Off-Policy Evaluation via Off-Policy Classification

Figure 3 for Off-Policy Evaluation via Off-Policy Classification

Figure 4 for Off-Policy Evaluation via Off-Policy Classification

Abstract:In this work, we consider the problem of model selection for deep reinforcement learning (RL) in real-world environments. Typically, the performance of deep RL algorithms is evaluated via on-policy interactions with the target environment. However, comparing models in a real-world environment for the purposes of early stopping or hyperparameter tuning is costly and often practically infeasible. This leads us to examine off-policy policy evaluation (OPE) in such settings. We focus on OPE for value-based methods, which are of particular interest in deep RL, with applications like robotics, where off-policy algorithms based on Q-function estimation can often attain better sample complexity than direct policy optimization. Existing OPE metrics either rely on a model of the environment, or the use of importance sampling (IS) to correct for the data being off-policy. However, for high-dimensional observations, such as images, models of the environment can be difficult to fit and value-based methods can make IS hard to use or even ill-conditioned, especially when dealing with continuous action spaces. In this paper, we focus on the specific case of MDPs with continuous action spaces and sparse binary rewards, which is representative of many important real-world applications. We propose an alternative metric that relies on neither models nor IS, by framing OPE as a positive-unlabeled (PU) classification problem with the Q-function as the decision function. We experimentally show that this metric outperforms baselines on a number of tasks. Most importantly, it can reliably predict the relative performance of different policies in a number of generalization scenarios, including the transfer to the real-world of policies trained in simulation for an image-based robotic manipulation task.

* Accepted to ICML 2019 RL4RealLife workshop

Via

Access Paper or Ask Questions

The Principle of Unchanged Optimality in Reinforcement Learning Generalization

Jun 02, 2019

Alex Irpan, Xingyou Song

Abstract:Several recent papers have examined generalization in reinforcement learning (RL), by proposing new environments or ways to add noise to existing environments, then benchmarking algorithms and model architectures on those environments. We discuss subtle conceptual properties of RL benchmarks that are not required in supervised learning (SL), and also properties that an RL benchmark should possess. Chief among them is one we call the principle of unchanged optimality: there should exist a single $\pi$ that is optimal across all train and test tasks. In this work, we argue why this principle is important, and ways it can be broken or satisfied due to subtle choices in state representation or model architecture. We conclude by discussing challenges and future lines of research in theoretically analyzing generalization benchmarks.

* Published at ICML 2019 Workshop "Understanding and Improving Generalization in Deep Learning"

Via

Access Paper or Ask Questions

Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks

Mar 25, 2019

Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, Konstantinos Bousmalis

Figure 1 for Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks

Figure 2 for Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks

Figure 3 for Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks

Figure 4 for Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks

Abstract:Real world data, especially in the domain of robotics, is notoriously costly to collect. One way to circumvent this can be to leverage the power of simulation to produce large amounts of labelled data. However, training models on simulated images does not readily transfer to real-world ones. Using domain adaptation methods to cross this "reality gap" requires a large amount of unlabelled real-world data, whilst domain randomization alone can waste modeling power. In this paper, we present Randomized-to-Canonical Adaptation Networks (RCANs), a novel approach to crossing the visual reality gap that uses no real-world data. Our method learns to translate randomized rendered images into their equivalent non-randomized, canonical versions. This in turn allows for real images to also be translated into canonical sim images. We demonstrate the effectiveness of this sim-to-real approach by training a vision-based closed-loop grasping reinforcement learning agent in simulation, and then transferring it to the real world to attain 70% zero-shot grasp success on unseen objects, a result that almost doubles the success of learning the same task directly on domain randomization alone. Additionally, by joint finetuning in the real-world with only 5,000 real-world grasps, our method achieves 91%, attaining comparable performance to a state-of-the-art system trained with 580,000 real-world grasps, resulting in a reduction of real-world data by more than 99%.

* To be published in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)

Via

Access Paper or Ask Questions

QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Nov 28, 2018

Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke(+1 more)

Figure 1 for QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Figure 2 for QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Figure 3 for QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Figure 4 for QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Abstract:In this paper, we study the problem of learning vision-based dynamic manipulation skills using a scalable reinforcement learning approach. We study this problem in the context of grasping, a longstanding challenge in robotic manipulation. In contrast to static learning behaviors that choose a grasp point and then execute the desired grasp, our method enables closed-loop vision-based control, whereby the robot continuously updates its grasp strategy based on the most recent observations to optimize long-horizon grasp success. To that end, we introduce QT-Opt, a scalable self-supervised vision-based reinforcement learning framework that can leverage over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters to perform closed-loop, real-world grasping that generalizes to 96% grasp success on unseen objects. Aside from attaining a very high success rate, our method exhibits behaviors that are quite distinct from more standard grasping systems: using only RGB vision-based perception from an over-the-shoulder camera, our method automatically learns regrasping strategies, probes objects to find the most effective grasps, learns to reposition objects and perform other non-prehensile pre-grasp manipulations, and responds dynamically to disturbances and perturbations.

* CoRL 2018 camera ready. 23 pages, 14 figures

Via

Access Paper or Ask Questions