Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eric Jang

Watch, Try, Learn: Meta-Learning from Demonstrations and Reward

Jun 07, 2019

Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn

Figure 1 for Watch, Try, Learn: Meta-Learning from Demonstrations and Reward

Figure 2 for Watch, Try, Learn: Meta-Learning from Demonstrations and Reward

Figure 3 for Watch, Try, Learn: Meta-Learning from Demonstrations and Reward

Figure 4 for Watch, Try, Learn: Meta-Learning from Demonstrations and Reward

Abstract:Imitation learning allows agents to learn complex behaviors from demonstrations. However, learning a complex vision-based task may require an impractical number of demonstrations. Meta-imitation learning is a promising approach towards enabling agents to learn a new task from one or a few demonstrations by leveraging experience from learning similar tasks. In the presence of task ambiguity or unobserved dynamics, demonstrations alone may not provide enough information; an agent must also try the task to successfully infer a policy. In this work, we propose a method that can learn to learn from both demonstrations and trial-and-error experience with sparse reward feedback. In comparison to meta-imitation, this approach enables the agent to effectively and efficiently improve itself autonomously beyond the demonstration data. In comparison to meta-reinforcement learning, we can scale to substantially broader distributions of tasks, as the demonstration reduces the burden of exploration. Our experiments show that our method significantly outperforms prior approaches on a set of challenging, vision-based control tasks.

Via

Access Paper or Ask Questions

QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Nov 28, 2018

Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke(+1 more)

Figure 1 for QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Figure 2 for QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Figure 3 for QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Figure 4 for QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Abstract:In this paper, we study the problem of learning vision-based dynamic manipulation skills using a scalable reinforcement learning approach. We study this problem in the context of grasping, a longstanding challenge in robotic manipulation. In contrast to static learning behaviors that choose a grasp point and then execute the desired grasp, our method enables closed-loop vision-based control, whereby the robot continuously updates its grasp strategy based on the most recent observations to optimize long-horizon grasp success. To that end, we introduce QT-Opt, a scalable self-supervised vision-based reinforcement learning framework that can leverage over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters to perform closed-loop, real-world grasping that generalizes to 96% grasp success on unseen objects. Aside from attaining a very high success rate, our method exhibits behaviors that are quite distinct from more standard grasping systems: using only RGB vision-based perception from an over-the-shoulder camera, our method automatically learns regrasping strategies, probes objects to find the most effective grasps, learns to reposition objects and perform other non-prehensile pre-grasp manipulations, and responds dynamically to disturbances and perturbations.

* CoRL 2018 camera ready. 23 pages, 14 figures

Via

Access Paper or Ask Questions

Grasp2Vec: Learning Object Representations from Self-Supervised Grasping

Nov 19, 2018

Eric Jang, Coline Devin, Vincent Vanhoucke, Sergey Levine

Figure 1 for Grasp2Vec: Learning Object Representations from Self-Supervised Grasping

Figure 2 for Grasp2Vec: Learning Object Representations from Self-Supervised Grasping

Figure 3 for Grasp2Vec: Learning Object Representations from Self-Supervised Grasping

Figure 4 for Grasp2Vec: Learning Object Representations from Self-Supervised Grasping

Abstract:Well structured visual representations can make robot learning faster and can improve generalization. In this paper, we study how we can acquire effective object-centric representations for robotic manipulation tasks without human labeling by using autonomous robot interaction with the environment. Such representation learning methods can benefit from continuous refinement of the representation as the robot collects more experience, allowing them to scale effectively without human intervention. Our representation learning approach is based on object persistence: when a robot removes an object from a scene, the representation of that scene should change according to the features of the object that was removed. We formulate an arithmetic relationship between feature vectors from this observation, and use it to learn a representation of scenes and objects that can then be used to identify object instances, localize them in the scene, and perform goal-directed grasping tasks where the robot must retrieve commanded objects from a bin. The same grasping procedure can also be used to automatically collect training data for our method, by recording images of scenes, grasping and removing an object, and recording the outcome. Our experiments demonstrate that this self-supervised approach for tasked grasping substantially outperforms direct reinforcement learning from images and prior representation learning methods.

* Proceedings of The 2nd Conference on Robot Learning, in PMLR 87:99-112 (2018)
* CoRL 2018. Eric Jang and Coline Devin contributed equally to this work

Via

Access Paper or Ask Questions

Generative Ensembles for Robust Anomaly Detection

Oct 24, 2018

Hyunsun Choi, Eric Jang

Figure 1 for Generative Ensembles for Robust Anomaly Detection

Figure 2 for Generative Ensembles for Robust Anomaly Detection

Figure 3 for Generative Ensembles for Robust Anomaly Detection

Figure 4 for Generative Ensembles for Robust Anomaly Detection

Abstract:Deep generative models are capable of learning probability distributions over large, high-dimensional datasets such as images, video and natural language. Generative models trained on samples from $p(x)$ ought to assign low likelihoods to out-of-distribution (OoD) samples from $q(x)$, making them suitable for anomaly detection applications. We show that in practice, likelihood models are themselves susceptible to OoD errors, and even assign large likelihoods to images from other natural datasets. To mitigate these issues, we propose Generative Ensembles, a model-independent technique for OoD detection that combines density-based anomaly detection with uncertainty estimation. Our method outperforms ODIN and VIB baselines on image datasets, and achieves comparable performance to a classification model on the Kaggle Credit Fraud dataset.

Via

Access Paper or Ask Questions

Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods

Mar 28, 2018

Deirdre Quillen, Eric Jang, Ofir Nachum, Chelsea Finn, Julian Ibarz, Sergey Levine

Figure 1 for Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods

Figure 2 for Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods

Figure 3 for Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods

Figure 4 for Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods

Abstract:In this paper, we explore deep reinforcement learning algorithms for vision-based robotic grasping. Model-free deep reinforcement learning (RL) has been successfully applied to a range of challenging environments, but the proliferation of algorithms makes it difficult to discern which particular approach would be best suited for a rich, diverse task like grasping. To answer this question, we propose a simulated benchmark for robotic grasping that emphasizes off-policy learning and generalization to unseen objects. Off-policy learning enables utilization of grasping data over a wide variety of objects, and diversity is important to enable the method to generalize to new objects that were not seen during training. We evaluate the benchmark tasks against a variety of Q-function estimation methods, a method previously proposed for robotic grasping with deep neural network models, and a novel approach based on a combination of Monte Carlo return estimation and an off-policy correction. Our results indicate that several simple methods provide a surprisingly strong competitor to popular algorithms such as double Q-learning, and our analysis of stability sheds light on the relative tradeoffs between the algorithms.

* 8 pages

Via

Access Paper or Ask Questions

Time-Contrastive Networks: Self-Supervised Learning from Video

Mar 20, 2018

Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine

Figure 1 for Time-Contrastive Networks: Self-Supervised Learning from Video

Figure 2 for Time-Contrastive Networks: Self-Supervised Learning from Video

Figure 3 for Time-Contrastive Networks: Self-Supervised Learning from Video

Figure 4 for Time-Contrastive Networks: Self-Supervised Learning from Video

Abstract:We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitate

Via

Access Paper or Ask Questions

Sim2Real View Invariant Visual Servoing by Recurrent Control

Dec 20, 2017

Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine

Figure 1 for Sim2Real View Invariant Visual Servoing by Recurrent Control

Figure 2 for Sim2Real View Invariant Visual Servoing by Recurrent Control

Figure 3 for Sim2Real View Invariant Visual Servoing by Recurrent Control

Figure 4 for Sim2Real View Invariant Visual Servoing by Recurrent Control

Abstract:Humans are remarkably proficient at controlling their limbs and tools from a wide range of viewpoints and angles, even in the presence of optical distortions. In robotics, this ability is referred to as visual servoing: moving a tool or end-point to a desired location using primarily visual feedback. In this paper, we study how viewpoint-invariant visual servoing skills can be learned automatically in a robotic manipulation scenario. To this end, we train a deep recurrent controller that can automatically determine which actions move the end-point of a robotic arm to a desired object. The problem that must be solved by this controller is fundamentally ambiguous: under severe variation in viewpoint, it may be impossible to determine the actions in a single feedforward operation. Instead, our visual servoing system must use its memory of past movements to understand how the actions affect the robot motion from the current viewpoint, correcting mistakes and gradually moving closer to the target. This ability is in stark contrast to most visual servoing methods, which either assume known dynamics or require a calibration phase. We show how we can learn this recurrent controller using simulated data and a reinforcement learning objective. We then describe how the resulting model can be transferred to a real-world robot by disentangling perception from control and only adapting the visual layers. The adapted model can servo to previously unseen objects from novel viewpoints on a real-world Kuka IIWA robotic arm. For supplementary videos, see: https://fsadeghi.github.io/Sim2RealViewInvariantServo

* Supplementary video: https://fsadeghi.github.io/Sim2RealViewInvariantServo

Via

Access Paper or Ask Questions

End-to-End Learning of Semantic Grasping

Nov 09, 2017

Eric Jang, Sudheendra Vijayanarasimhan, Peter Pastor, Julian Ibarz, Sergey Levine

Figure 1 for End-to-End Learning of Semantic Grasping

Figure 2 for End-to-End Learning of Semantic Grasping

Figure 3 for End-to-End Learning of Semantic Grasping

Figure 4 for End-to-End Learning of Semantic Grasping

Abstract:We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that learns object detection, classification, and grasp planning in an end-to-end fashion. A "ventral stream" recognizes object class while a "dorsal stream" simultaneously interprets the geometric relationships necessary to execute successful grasps. We leverage the autonomous data collection capabilities of robots to obtain a large self-supervised dataset for training the dorsal stream, and use semi-supervised label propagation to train the ventral stream with only a modest amount of human supervision. We experimentally show that our approach improves upon grasping systems whose components are not learned end-to-end, including a baseline method that uses bounding box detection. Furthermore, we show that jointly training our model with auxiliary data consisting of non-semantic grasping data, as well as semantically labeled images without grasp actions, has the potential to substantially improve semantic grasping performance.

* 14 pages

Via

Access Paper or Ask Questions

Categorical Reparameterization with Gumbel-Softmax

Aug 05, 2017

Eric Jang, Shixiang Gu, Ben Poole

Figure 1 for Categorical Reparameterization with Gumbel-Softmax

Figure 2 for Categorical Reparameterization with Gumbel-Softmax

Figure 3 for Categorical Reparameterization with Gumbel-Softmax

Figure 4 for Categorical Reparameterization with Gumbel-Softmax

Abstract:Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpropagate through samples. In this work, we present an efficient gradient estimator that replaces the non-differentiable sample from a categorical distribution with a differentiable sample from a novel Gumbel-Softmax distribution. This distribution has the essential property that it can be smoothly annealed into a categorical distribution. We show that our Gumbel-Softmax estimator outperforms state-of-the-art gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables, and enables large speedups on semi-supervised classification.

Via

Access Paper or Ask Questions