Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan Tremblay

Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs

Dec 14, 2020
Yifeng Zhu, Jonathan Tremblay, Stan Birchfield, Yuke Zhu

Figure 1 for Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs

Figure 2 for Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs

Figure 3 for Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs

Figure 4 for Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs

We present a visually grounded hierarchical planning algorithm for long-horizon manipulation tasks. Our algorithm offers a joint framework of neuro-symbolic task planning and low-level motion generation conditioned on the specified goal. At the core of our approach is a two-level scene graph representation, namely geometric scene graph and symbolic scene graph. This hierarchical representation serves as a structured, object-centric abstraction of manipulation scenes. Our model uses graph neural networks to process these scene graphs for predicting high-level task plans and low-level motions. We demonstrate that our method scales to long-horizon tasks and generalizes well to novel task goals. We validate our method in a kitchen storage task in both physical simulation and the real world. Our experiments show that our method achieved over 70% success rate and nearly 90% of subgoal completion rate on the real robot while being four orders of magnitude faster in computation time compared to standard search-based task-and-motion planner.

Via

Access Paper or Ask Questions

Fast Uncertainty Quantification for Deep Object Pose Estimation

Nov 16, 2020
Guanya Shi, Yifeng Zhu, Jonathan Tremblay, Stan Birchfield, Fabio Ramos, Animashree Anandkumar, Yuke Zhu

Figure 1 for Fast Uncertainty Quantification for Deep Object Pose Estimation

Figure 2 for Fast Uncertainty Quantification for Deep Object Pose Estimation

Figure 3 for Fast Uncertainty Quantification for Deep Object Pose Estimation

Figure 4 for Fast Uncertainty Quantification for Deep Object Pose Estimation

Deep learning-based object pose estimators are often unreliable and overconfident especially when the input image is outside the training domain, for instance, with sim2real transfer. Efficient and robust uncertainty quantification (UQ) in pose estimators is critically needed in many robotic tasks. In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation. We ensemble 2-3 pre-trained models with different neural network architectures and/or training data sources, and compute their average pairwise disagreement against one another to obtain the uncertainty quantification. We propose four disagreement metrics, including a learned metric, and show that the average distance (ADD) is the best learning-free metric and it is only slightly worse than the learned metric, which requires labeled target data. Our method has several advantages compared to the prior art: 1) our method does not require any modification of the training process or the model inputs; and 2) it needs only one forward pass for each model. We evaluate the proposed UQ method on three tasks where our uncertainty quantification yields much stronger correlations with pose estimation errors than the baselines. Moreover, in a real robot grasping task, our method increases the grasping success rate from 35% to 90%.

* Video and code are available at https://sites.google.com/view/fastuq

Via

Access Paper or Ask Questions

Joint Space Control via Deep Reinforcement Learning

Nov 12, 2020
Visak Kumar, David Hoeller, Balakumar Sundaralingam, Jonathan Tremblay, Stan Birchfield

Figure 1 for Joint Space Control via Deep Reinforcement Learning

Figure 2 for Joint Space Control via Deep Reinforcement Learning

Figure 3 for Joint Space Control via Deep Reinforcement Learning

Figure 4 for Joint Space Control via Deep Reinforcement Learning

The dominant way to control a robot manipulator uses hand-crafted differential equations leveraging some form of inverse kinematics / dynamics. We propose a simple, versatile joint-level controller that dispenses with differential equations entirely. A deep neural network, trained via model-free reinforcement learning, is used to map from task space to joint space. Experiments show the method capable of achieving similar error to traditional methods, while greatly simplifying the process by automatically handling redundancy, joint limits, and acceleration / deceleration profiles. The basic technique is extended to avoid obstacles by augmenting the input to the network with information about the nearest obstacles. Results are shown both in simulation and on a real robot via sim-to-real transfer of the learned policy. We show that it is possible to achieve sub-centimeter accuracy, both in simulation and the real world, with a moderate amount of training.

* Submitted to ICRA 2021

Via

Access Paper or Ask Questions

Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera

Aug 26, 2020
Jonathan Tremblay, Stephen Tyree, Terry Mosier, Stan Birchfield

Figure 1 for Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera

Figure 2 for Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera

Figure 3 for Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera

Figure 4 for Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera

We present a robotic grasping system that uses a single external monocular RGB camera as input. The object-to-robot pose is computed indirectly by combining the output of two neural networks: one that estimates the object-to-camera pose, and another that estimates the robot-to-camera pose. Both networks are trained entirely on synthetic data, relying on domain randomization to bridge the sim-to-real gap. Because the latter network performs online camera calibration, the camera can be moved freely during execution without affecting the quality of the grasp. Experimental results analyze the effect of camera placement, image resolution, and pose refinement in the context of grasping several household objects. We also present results on a new set of 28 textured household toy grocery objects, which have been selected to be accessible to other researchers. To aid reproducibility of the research, we offer 3D scanned textured models, along with pre-trained weights for pose estimation.

* IROS 2020. Video at https://youtu.be/E0J91llX-ys

Via

Access Paper or Ask Questions

Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

May 26, 2020
Michelle A. Lee, Carlos Florensa, Jonathan Tremblay, Nathan Ratliff, Animesh Garg, Fabio Ramos, Dieter Fox

Figure 1 for Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Figure 2 for Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Figure 3 for Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a locally learned-policy can be used directly with raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing peg insertion. Videos are available at https://sites.google.com/view/guapo-rl

* International Conference in Robotics and Automation 2020

Via

Access Paper or Ask Questions

PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data

May 02, 2020
Zheng Tang, Milind Naphade, Stan Birchfield, Jonathan Tremblay, William Hodge, Ratnesh Kumar, Shuo Wang, Xiaodong Yang

Figure 1 for PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data

Figure 2 for PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data

Figure 3 for PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data

Figure 4 for PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data

In comparison with person re-identification (ReID), which has been widely studied in the research community, vehicle ReID has received less attention. Vehicle ReID is challenging due to 1) high intra-class variability (caused by the dependency of shape and appearance on viewpoint), and 2) small inter-class variability (caused by the similarity in shape and appearance between vehicles produced by different manufacturers). To address these challenges, we propose a Pose-Aware Multi-Task Re-Identification (PAMTRI) framework. This approach includes two innovations compared with previous methods. First, it overcomes viewpoint-dependency by explicitly reasoning about vehicle pose and shape via keypoints, heatmaps and segments from pose estimation. Second, it jointly classifies semantic vehicle attributes (colors and types) while performing ReID, through multi-task learning with the embedded pose representations. Since manually labeling images with detailed pose and attribute information is prohibitive, we create a large-scale highly randomized synthetic dataset with automatically annotated vehicle attributes for training. Extensive experiments validate the effectiveness of each proposed component, showing that PAMTRI achieves significant improvement over state-of-the-art on two mainstream vehicle ReID benchmarks: VeRi and CityFlow-ReID. Code and models are available at https://github.com/NVlabs/PAMTRI.

* Accepted by ICCV 2019

Via

Access Paper or Ask Questions

Camera-to-Robot Pose Estimation from a Single Image

Dec 05, 2019
Timothy E. Lee, Jonathan Tremblay, Thang To, Jia Cheng, Terry Mosier, Oliver Kroemer, Dieter Fox, Stan Birchfield

Figure 1 for Camera-to-Robot Pose Estimation from a Single Image

Figure 2 for Camera-to-Robot Pose Estimation from a Single Image

Figure 3 for Camera-to-Robot Pose Estimation from a Single Image

Figure 4 for Camera-to-Robot Pose Estimation from a Single Image

We present an approach for estimating the pose of a camera with respect to a robot from a single image. Our method uses a deep neural network to process an RGB image from the camera to detect 2D keypoints on the robot. The network is trained entirely on simulated data using domain randomization. Perspective-$n$-point (P$n$P) is then used to recover the camera extrinsics, assuming that the joint configuration of the robot manipulator is known. Unlike classic hand-eye calibration systems, our method does not require an off-line calibration step but rather is capable of computing the camera extrinsics from a single frame, thus opening the possibility of on-line calibration. We show experimental results for three different camera sensors, demonstrating that our approach is able to achieve accuracy with a single frame that is better than that of classic off-line hand-eye calibration using multiple frames. With additional frames, accuracy improves even further. Code, datasets, and pretrained models for three widely-used robot manipulators will be made available.

* submitted to ICRA 2020

Via

Access Paper or Ask Questions

Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies

Nov 24, 2019
Visak Kumar, Tucker Hermans, Dieter Fox, Stan Birchfield, Jonathan Tremblay

Figure 1 for Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies

Figure 2 for Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies

Figure 3 for Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies

Figure 4 for Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies

Using simulation to train robot manipulation policies holds the promise of an almost unlimited amount of training data, generated safely out of harm's way. One of the key challenges of using simulation, to date, has been to bridge the reality gap, so that policies trained in simulation can be deployed in the real world. We explore the reality gap in the context of learning a contextual policy for multi-fingered robotic grasping. We propose a Grasping Objects Approach for Tactile (GOAT) robotic hands, learning to overcome the reality gap problem. In our approach we use human hand motion demonstration to initialize and reduce the search space for learning. We contextualize our policy with the bounding cuboid dimensions of the object of interest, which allows the policy to work on a more flexible representation than directly using an image or point cloud. Leveraging fingertip touch sensors in the hand allows the policy to overcome the reduction in geometric information introduced by the coarse bounding box, as well as pose estimation uncertainty. We show our learned policy successfully runs on a real robot without any fine tuning, thus bridging the reality gap.

Via

Access Paper or Ask Questions

Directional Semantic Grasping of Real-World Objects: From Simulation to Reality

Sep 04, 2019
Shariq Iqbal, Jonathan Tremblay, Thang To, Jia Cheng, Erik Leitch, Andy Campbell, Kirby Leung, Duncan McKay, Stan Birchfield

Figure 1 for Directional Semantic Grasping of Real-World Objects: From Simulation to Reality

Figure 2 for Directional Semantic Grasping of Real-World Objects: From Simulation to Reality

Figure 3 for Directional Semantic Grasping of Real-World Objects: From Simulation to Reality

Figure 4 for Directional Semantic Grasping of Real-World Objects: From Simulation to Reality

We present a deep reinforcement learning approach to grasp semantically meaningful objects from a particular direction. The system is trained entirely in simulation, with sim-to-real transfer accomplished by using a simulator that models physical contact and produces photorealistic imagery with domain randomized backgrounds. The system is an example of end-to-end (mapping input monocular RGB images to output Cartesian motor commands) grasping of objects from multiple pre-defined object-centric orientations, such as from the side or top. Coupled with a real-time 6-DoF object pose estimator, the eye-in-hand system is capable of grasping objects anywhere within the graspable workspace. Results are shown in both simulation and the real world, demonstrating the effectiveness of the approach.

* Video is at https://youtu.be/bjJLtNdVj9w

Via

Access Paper or Ask Questions