Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan Kelly

Self-Supervised Pre-training of 3D Point Cloud Networks with Image Data

Dec 13, 2022

Andrej Janda, Brandon Wagstaff, Edwin G. Ng, Jonathan Kelly

Figure 1 for Self-Supervised Pre-training of 3D Point Cloud Networks with Image Data

Figure 2 for Self-Supervised Pre-training of 3D Point Cloud Networks with Image Data

Figure 3 for Self-Supervised Pre-training of 3D Point Cloud Networks with Image Data

Abstract:Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is especially important for semantic segmentation tasks involving 3D datasets that are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on large unlabelled datasets is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point cloud data exclusively; this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities, by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.

* Accepted to the Conference on Robot Learning (CoRL'22) Workshop on Pre-training Robot Learning, Auckland, New Zealand, December 14-18, 2022

Via

Access Paper or Ask Questions

SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

Nov 22, 2022

Ashkan Mirzaei, Tristan Aumentado-Armstrong, Konstantinos G. Derpanis, Jonathan Kelly, Marcus A. Brubaker, Igor Gilitschenski, Alex Levinshtein

Abstract:Neural Radiance Fields (NeRFs) have emerged as a popular approach for novel view synthesis. While NeRFs are quickly being adapted for a wider set of applications, intuitively editing NeRF scenes is still an open challenge. One important editing task is the removal of unwanted objects from a 3D scene, such that the replaced region is visually plausible and consistent with its context. We refer to this task as 3D inpainting. In 3D, solutions must be both consistent across multiple views and geometrically valid. In this paper, we propose a novel 3D inpainting method that addresses these challenges. Given a small set of posed images and sparse annotations in a single input image, our framework first rapidly obtains a 3D segmentation mask for a target object. Using the mask, a perceptual optimizationbased approach is then introduced that leverages learned 2D image inpainters, distilling their information into 3D space, while ensuring view consistency. We also address the lack of a diverse benchmark for evaluating 3D scene inpainting methods by introducing a dataset comprised of challenging real-world scenes. In particular, our dataset contains views of the same scene with and without a target object, enabling more principled benchmarking of the 3D inpainting task. We first demonstrate the superiority of our approach on multiview segmentation, comparing to NeRFbased methods and 2D segmentation approaches. We then evaluate on the task of 3D inpainting, establishing state-ofthe-art performance against other NeRF manipulation algorithms, as well as a strong 2D image inpainter baseline

* Project Page: https://spinnerf3d.github.io

Via

Access Paper or Ask Questions

Spatiotemporal Calibration of 3D mm-Wavelength Radar-Camera Pairs

Nov 03, 2022

Emmett Wise, Qilong Cheng, Jonathan Kelly

Abstract:Autonomous vehicles (AVs) often depend on multiple sensors and sensing modalities to mitigate data degradation and provide a measure of robustness when operating in adverse conditions. Radars and cameras are a popular sensor combination - although radar measurements are sparse in comparison to camera images, radar scans are able to penetrate fog, rain, and snow. Data from both sensors are typically fused in a common reference frame prior to use in downstream perception tasks. However, accurate sensor fusion depends upon knowledge of the spatial transform between the sensors and any temporal misalignment that exists in their measurement times. During the life cycle of an AV, these calibration parameters may change. The ability to perform in-situ spatiotemporal calibration is essential to ensure reliable long-term operation. State-of-the-art 3D radar-camera spatiotemporal calibration algorithms require bespoke calibration targets, which are not readily available in the field. In this paper, we describe an algorithm for targetless spatiotemporal calibration that is able to operate without specialized infrastructure. Our approach leverages the ability of the radar unit to measure its own ego-velocity relative to a fixed external reference frame. We analyze the identifiability of the spatiotemporal calibration problem and determine the motions necessary for calibration. Through a series of simulation studies, we characterize the sensitivity of our algorithm to measurement noise. Finally, we demonstrate accurate calibration for three real-world systems, including a handheld sensor rig and a vehicle-mounted sensor array. Our results show that we are able to match the performance of an existing, target-based method, while calibrating in arbitrary (infrastructure-free) environments.

* Submitted to IEEE Transactions on Robotics, Oct. 2022

Via

Access Paper or Ask Questions

One Network, Many Robots: Generative Graphical Inverse Kinematics

Sep 22, 2022

Oliver Limoyo, Filip Marić, Matthew Giamou, Petra Alexson, Ivan Petrović, Jonathan Kelly

Figure 1 for One Network, Many Robots: Generative Graphical Inverse Kinematics

Figure 2 for One Network, Many Robots: Generative Graphical Inverse Kinematics

Figure 3 for One Network, Many Robots: Generative Graphical Inverse Kinematics

Figure 4 for One Network, Many Robots: Generative Graphical Inverse Kinematics

Abstract:Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for robotic manipulation. Existing numerical solvers are broadly applicable, but rely on local search techniques to manage highly nonconvex objective functions. Recently, learning-based approaches have shown promise as a means to generate fast and accurate IK results; learned solvers can easily be integrated with other learning algorithms in end-to-end systems. However, learning-based methods have an Achilles' heel: each robot of interest requires a specialized model which must be trained from scratch. To address this key shortcoming, we investigate a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the flexibility of graph neural networks (GNNs). We use this approach to train the first learned generative graphical inverse kinematics (GGIK) solver that is, crucially, "robot-agnostic"-a single model is able to provide IK solutions for a variety of different robots. Additionally, the generative nature of GGIK allows the solver to produce a large number of diverse solutions in parallel with minimal additional computation time, making it appropriate for applications such as sampling-based motion planning. Finally, GGIK can complement local IK solvers by providing reliable initializations. These advantages, as well as the ability to use task-relevant priors and to continuously improve with new data, suggest that GGIK has the potential to be a key component of flexible, learning-based robotic manipulation systems.

Via

Access Paper or Ask Questions

LaTeRF: Label and Text Driven Object Radiance Fields

Jul 18, 2022

Ashkan Mirzaei, Yash Kant, Jonathan Kelly, Igor Gilitschenski

Figure 1 for LaTeRF: Label and Text Driven Object Radiance Fields

Figure 2 for LaTeRF: Label and Text Driven Object Radiance Fields

Figure 3 for LaTeRF: Label and Text Driven Object Radiance Fields

Figure 4 for LaTeRF: Label and Text Driven Object Radiance Fields

Abstract:Obtaining 3D object representations is important for creating photo-realistic simulations and for collecting AR and VR assets. Neural fields have shown their effectiveness in learning a continuous volumetric representation of a scene from 2D images, but acquiring object representations from these models with weak supervision remains an open challenge. In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene, known camera poses, a natural language description of the object, and a set of point-labels of object and non-object points in the input images. To faithfully extract the object from the scene, LaTeRF extends the NeRF formulation with an additional `objectness' probability at each 3D point. Additionally, we leverage the rich latent space of a pre-trained CLIP model combined with our differentiable object renderer, to inpaint the occluded parts of the object. We demonstrate high-fidelity object extraction on both synthetic and real-world datasets and justify our design choices through an extensive ablation study.

Via

Access Paper or Ask Questions

Learning Sequential Latent Variable Models from Multimodal Time Series Data

Apr 21, 2022

Oliver Limoyo, Trevor Ablett, Jonathan Kelly

Figure 1 for Learning Sequential Latent Variable Models from Multimodal Time Series Data

Figure 2 for Learning Sequential Latent Variable Models from Multimodal Time Series Data

Figure 3 for Learning Sequential Latent Variable Models from Multimodal Time Series Data

Figure 4 for Learning Sequential Latent Variable Models from Multimodal Time Series Data

Abstract:Sequential modelling of high-dimensional data is an important problem that appears in many domains including model-based reinforcement learning and dynamics identification for control. Latent variable models applied to sequential data (i.e., latent dynamics models) have been shown to be a particularly effective probabilistic approach to solve this problem, especially when dealing with images. However, in many application areas (e.g., robotics), information from multiple sensing modalities is available -- existing latent dynamics methods have not yet been extended to effectively make use of such multimodal sequential data. Multimodal sensor streams can be correlated in a useful manner and often contain complementary information across modalities. In this work, we present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics. Using synthetic and real-world datasets from a multimodal robotic planar pushing task, we demonstrate that our approach leads to significant improvements in prediction and representation quality. Furthermore, we compare to the common learning baseline of concatenating each modality in the latent space and show that our principled probabilistic formulation performs better. Finally, despite being fully self-supervised, we demonstrate that our method is nearly as effective as an existing supervised approach that relies on ground truth labels.

* Accepted to the International Conference on Intelligent Autonomous Systems (IAS'17), Zagreb, Croatia, June 13-16, 2022

Via

Access Paper or Ask Questions

A Self-Supervised, Differentiable Kalman Filter for Uncertainty-Aware Visual-Inertial Odometry

Mar 14, 2022

Brandon Wagstaff, Emmett Wise, Jonathan Kelly

Figure 1 for A Self-Supervised, Differentiable Kalman Filter for Uncertainty-Aware Visual-Inertial Odometry

Figure 2 for A Self-Supervised, Differentiable Kalman Filter for Uncertainty-Aware Visual-Inertial Odometry

Figure 3 for A Self-Supervised, Differentiable Kalman Filter for Uncertainty-Aware Visual-Inertial Odometry

Figure 4 for A Self-Supervised, Differentiable Kalman Filter for Uncertainty-Aware Visual-Inertial Odometry

Abstract:Traditionally, visual-inertial-odometry (VIO) systems rely on filtering or optimization-based frameworks for robot egomotion estimation. While these methods are accurate under nominal conditions, they are prone to failure in degraded environments, where illumination changes, fast camera motion, or textureless scenes are present. Learning-based systems have the potential to outperform classical implementations in degraded environments, but are, currently, less accurate than classical methods in nominal settings. A third class, of hybrid systems, attempts to leverage the advantages of both systems. Herein, we introduce a framework for training a hybrid VIO system. Our approach uses a differentiable Kalman filter with an IMU-based process model and a robust, neural network-based relative pose measurement model. By utilizing the data efficiency of self-supervised learning, we show that our system significantly outperforms a similar, supervised system, while enabling online retraining. To demonstrate the utility of our approach, we evaluate our system on a visually degraded version of the EuRoC dataset. Notably, we find that, in cases where classical estimators consistently diverge, our estimator does not diverge or suffer from a significant reduction in accuracy. Finally, our system, by properly utilizing the metric information contained in the IMU measurements, is able to recover metric scale, while other self-supervised monocular VIO approaches cannot.

* Submitted to AIM 2022

Via

Access Paper or Ask Questions

Fast Object Inertial Parameter Identification for Collaborative Robots

Mar 02, 2022

Philippe Nadeau, Matthew Giamou, Jonathan Kelly

Figure 1 for Fast Object Inertial Parameter Identification for Collaborative Robots

Figure 2 for Fast Object Inertial Parameter Identification for Collaborative Robots

Figure 3 for Fast Object Inertial Parameter Identification for Collaborative Robots

Figure 4 for Fast Object Inertial Parameter Identification for Collaborative Robots

Abstract:Collaborative robots (cobots) are machines designed to work safely alongside people in human-centric environments. Providing cobots with the ability to quickly infer the inertial parameters of manipulated objects will improve their flexibility and enable greater usage in manufacturing and other areas. To ensure safety, cobots are subject to kinematic limits that result in low signal-to-noise ratios (SNR) for velocity, acceleration, and force-torque data. This renders existing inertial parameter identification algorithms prohibitively slow and inaccurate. Motivated by the desire for faster model acquisition, we investigate the use of an approximation of rigid body dynamics to improve the SNR. Additionally, we introduce a mass discretization method that can make use of shape information to quickly identify plausible inertial parameters for a manipulated object. We present extensive simulation studies and real-world experiments demonstrating that our approach complements existing inertial parameter identification methods by specifically targeting the typical cobot operating regime.

* Accepted to the International Conference on Robotics and Automation (ICRA'22), Philadelphia, USA, May 23-27, 2022

Via

Access Paper or Ask Questions

Learning to Detect Slip with Barometric Tactile Sensors and a Temporal Convolutional Neural Network

Feb 19, 2022

Abhinav Grover, Philippe Nadeau, Christopher Grebe, Jonathan Kelly

Figure 1 for Learning to Detect Slip with Barometric Tactile Sensors and a Temporal Convolutional Neural Network

Figure 2 for Learning to Detect Slip with Barometric Tactile Sensors and a Temporal Convolutional Neural Network

Figure 3 for Learning to Detect Slip with Barometric Tactile Sensors and a Temporal Convolutional Neural Network

Figure 4 for Learning to Detect Slip with Barometric Tactile Sensors and a Temporal Convolutional Neural Network

Abstract:The ability to perceive object slip via tactile feedback enables humans to accomplish complex manipulation tasks including maintaining a stable grasp. Despite the utility of tactile information for many applications, tactile sensors have yet to be widely deployed in industrial robotics settings; part of the challenge lies in identifying slip and other events from the tactile data stream. In this paper, we present a learning-based method to detect slip using barometric tactile sensors. These sensors have many desirable properties including high durability and reliability, and are built from inexpensive, off-the-shelf components. We train a temporal convolution neural network to detect slip, achieving high detection accuracies while displaying robustness to the speed and direction of the slip motion. Further, we test our detector on two manipulation tasks involving a variety of common objects and demonstrate successful generalization to real-world scenarios not seen during training. We argue that barometric tactile sensing technology, combined with data-driven learning, is suitable for many manipulation tasks such as slip compensation.

* To appear in Proceedings of the IEEE International Conference on Robotics and Automation 2022. arXiv admin note: substantial text overlap with arXiv:2103.13460

Via

Access Paper or Ask Questions

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Dec 16, 2021

Trevor Ablett, Bryan Chan, Jonathan Kelly

Figure 1 for Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Figure 2 for Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Figure 3 for Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Figure 4 for Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Abstract:Effective exploration continues to be a significant challenge that prevents the deployment of reinforcement learning for many physical systems. This is particularly true for systems with continuous and high-dimensional state and action spaces, such as robotic manipulators. The challenge is accentuated in the sparse rewards setting, where the low-level state information required for the design of dense rewards is unavailable. Adversarial imitation learning (AIL) can partially overcome this barrier by leveraging expert-generated demonstrations of optimal behaviour and providing, essentially, a replacement for dense reward information. Unfortunately, the availability of expert demonstrations does not necessarily improve an agent's capability to explore effectively and, as we empirically show, can lead to inefficient or stagnated learning. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks. Subsequently, a hierarchical model is used to learn each task reward and policy through a modified AIL procedure, in which exploration of all tasks is enforced via a scheduler composing different tasks together. This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible. Our experimental results in a challenging multitask robotic manipulation domain indicate that our method compares favourably to supervised imitation learning and to a state-of-the-art AIL method. Code is available at https://github.com/utiasSTARS/lfgp.

* Accepted at the Neurips 2021 Deep Reinforcement Learning Workshop, Sydney, Australia

Via

Access Paper or Ask Questions