Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Camillo J. Taylor

DFineNet: Ego-Motion Estimation and Depth Refinement from Sparse, Noisy Depth Input with RGB Guidance

Apr 10, 2019

Yilun Zhang, Ty Nguyen, Ian D. Miller, Shreyas S. Shivakumar, Steven Chen, Camillo J. Taylor, Vijay Kumar

Figure 1 for DFineNet: Ego-Motion Estimation and Depth Refinement from Sparse, Noisy Depth Input with RGB Guidance

Figure 2 for DFineNet: Ego-Motion Estimation and Depth Refinement from Sparse, Noisy Depth Input with RGB Guidance

Figure 3 for DFineNet: Ego-Motion Estimation and Depth Refinement from Sparse, Noisy Depth Input with RGB Guidance

Figure 4 for DFineNet: Ego-Motion Estimation and Depth Refinement from Sparse, Noisy Depth Input with RGB Guidance

Abstract:Depth estimation is an important capability for autonomous vehicles to understand and reconstruct 3D environments as well as avoid obstacles during the execution. Accurate depth sensors such as LiDARs are often heavy, expensive and can only provide sparse depth while lighter depth sensors such as stereo cameras are noiser in comparison. We propose an end-to-end learning algorithm that is capable of using sparse, noisy input depth for refinement and depth completion. Our model also produces the camera pose as a byproduct, making it a great solution for autonomous systems. We evaluate our approach on both indoor and outdoor datasets. Empirical results show that our method performs well on the KITTI~\cite{kitti_geiger2012we} dataset when compared to other competing methods, while having superior performance in dealing with sparse, noisy input depth on the TUM~\cite{sturm12iros} dataset.

Via

Access Paper or Ask Questions

MAVNet: an Effective Semantic Segmentation Micro-Network for MAV-based Tasks

Apr 03, 2019

Ty Nguyen, Tolga Ozaslan, Ian D. Miller, James Keller, Shreyas Shivakumar, Giuseppe Loianno, Camillo J. Taylor, Vijay Kumar, Joseph H. Harwood, Jennifer Wozencraft

Figure 1 for MAVNet: an Effective Semantic Segmentation Micro-Network for MAV-based Tasks

Figure 2 for MAVNet: an Effective Semantic Segmentation Micro-Network for MAV-based Tasks

Figure 3 for MAVNet: an Effective Semantic Segmentation Micro-Network for MAV-based Tasks

Figure 4 for MAVNet: an Effective Semantic Segmentation Micro-Network for MAV-based Tasks

Abstract:Real-time image semantic segmentation is an essential capability to enhance robot autonomy and improve human situational awareness. In this paper, we present MAVNet, a novel deep neural network approach for semantic segmentation suitable for small scale Micro Aerial Vehicles (MAVs). Our approach is compatible with the size, weight, and power(SWaP) constraints typical of small scale MAVs, which can only employ small processing units and GPUs. These units have typically limited computational capacity, which has to be concurrently shared with other real time performance tasks such as visual odometry and path planning. Our proposed solution MAVNet, is a fast and compact network inspired by ERFNet and features about 400 times fewer parameters in comparison. Experimental results on multiple datasets validate our proposed approach. Additionally, comparisons with other state of the art approaches show that our solution outperforms theirs in terms of speed and accuracy achieving up to 48 FPS on an NVIDIA 1080Ti and 9 FPS on the NVIDIA Jetson Xavier when processing high resolution imagery. Our algorithm and datasets are made publicly available.

Via

Access Paper or Ask Questions

Monocular Camera Based Fruit Counting and Mapping with Semantic Data Association

Mar 18, 2019

Xu Liu, Steven W. Chen, Chenhao Liu, Shreyas S. Shivakumar, Jnaneshwar Das, Camillo J. Taylor, James Underwood, Vijay Kumar

Figure 1 for Monocular Camera Based Fruit Counting and Mapping with Semantic Data Association

Figure 2 for Monocular Camera Based Fruit Counting and Mapping with Semantic Data Association

Figure 3 for Monocular Camera Based Fruit Counting and Mapping with Semantic Data Association

Figure 4 for Monocular Camera Based Fruit Counting and Mapping with Semantic Data Association

Abstract:We present a cheap, lightweight, and fast fruit counting pipeline that uses a single monocular camera. Our pipeline that relies only on a monocular camera, achieves counting performance comparable to state-of-the-art fruit counting system that utilizes an expensive sensor suite including LiDAR and GPS/INS on a mango dataset. Our monocular camera pipeline begins with a fruit detection component that uses a deep neural network. It then uses semantic structure from motion (SFM) to convert these detections into fruit counts by estimating landmark locations of the fruit in 3D, and using these landmarks to identify double counting scenarios. There are many benefits of developing a low cost and lightweight fruit counting system, including applicability to agriculture in developing countries, where monetary constraints or unstructured environments necessitate cheaper hardware solutions.

* Accepted in IEEE Robotics and Automation Letters (RA-L), 8 pages

Via

Access Paper or Ask Questions

DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion

Feb 02, 2019

Shreyas S. Shivakumar, Ty Nguyen, Steven W. Chen, Camillo J. Taylor

Figure 1 for DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion

Figure 2 for DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion

Figure 3 for DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion

Figure 4 for DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion

Abstract:In this paper we propose a convolutional neural network that is designed to upsample a series of sparse range measurements based on the contextual cues gleaned from a high resolution intensity image. Our approach draws inspiration from related work on super-resolution and in-painting. We propose a novel architecture that seeks to pull contextual cues separately from the intensity image and the depth features and then fuse them later in the network. We argue that this approach effectively exploits the relationship between the two modalities and produces accurate results while respecting salient image structures. We present experimental results to demonstrate that our approach is comparable with state of the art methods and generalizes well across multiple datasets.

* 11 pages

Via

Access Paper or Ask Questions

Simultaneous Localization and Layout Model Selection in Manhattan Worlds

Dec 13, 2018

Armon Shariati, Bernd Pfrommer, Camillo J. Taylor

Figure 1 for Simultaneous Localization and Layout Model Selection in Manhattan Worlds

Figure 2 for Simultaneous Localization and Layout Model Selection in Manhattan Worlds

Figure 3 for Simultaneous Localization and Layout Model Selection in Manhattan Worlds

Figure 4 for Simultaneous Localization and Layout Model Selection in Manhattan Worlds

Abstract:In this paper, we will demonstrate how Manhattan structure can be exploited to transform the Simultaneous Localization and Mapping (SLAM) problem, which is typically solved by a nonlinear optimization over feature positions, into a model selection problem solved by a convex optimization over higher order layout structures, namely walls, floors, and ceilings. Furthermore, we show how our novel formulation leads to an optimization procedure that automatically performs data association and loop closure and which ultimately produces the simplest model of the environment that is consistent with the available measurements. We verify our method on real world data sets collected with various sensing modalities.

Via

Access Paper or Ask Questions

Predictive and Semantic Layout Estimation for Robotic Applications in Manhattan Worlds

Nov 19, 2018

Armon Shariati, Bernd Pfrommer, Camillo J. Taylor

Figure 1 for Predictive and Semantic Layout Estimation for Robotic Applications in Manhattan Worlds

Figure 2 for Predictive and Semantic Layout Estimation for Robotic Applications in Manhattan Worlds

Figure 3 for Predictive and Semantic Layout Estimation for Robotic Applications in Manhattan Worlds

Figure 4 for Predictive and Semantic Layout Estimation for Robotic Applications in Manhattan Worlds

Abstract:This paper describes an approach to automatically extracting floor plans from the kinds of incomplete measurements that could be acquired by an autonomous mobile robot. The approach proceeds by reasoning about extended structural layout surfaces which are automatically extracted from the available data. The scheme can be run in an online manner to build water tight representations of the environment. The system effectively speculates about room boundaries and free space regions which provides useful guidance to subsequent motion planning systems. Experimental results are presented on multiple data sets.

Via

Access Paper or Ask Questions

Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements

Sep 20, 2018

Shreyas S. Shivakumar, Kartik Mohta, Bernd Pfrommer, Vijay Kumar, Camillo J. Taylor

Figure 1 for Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements

Figure 2 for Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements

Figure 3 for Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements

Figure 4 for Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements

Abstract:We present an approach to depth estimation that fuses information from a stereo pair with sparse range measurements derived from a LIDAR sensor or a range camera. The goal of this work is to exploit the complementary strengths of the two sensor modalities, the accurate but sparse range measurements and the ambiguous but dense stereo information. These two sources are effectively and efficiently fused by combining ideas from anisotropic diffusion and semi-global matching. We evaluate our approach on the KITTI 2015 and Middlebury 2014 datasets, using randomly sampled ground truth range measurements as our sparse depth input. We achieve significant performance improvements with a small fraction of range measurements on both datasets. We also provide qualitative results from our platform using the PMDTec Monstar sensor. Our entire pipeline runs on an NVIDIA TX-2 platform at 5Hz on 1280x1024 stereo images with 128 disparity levels.

* 7 pages, 5 figures, 2 tables

Via

Access Paper or Ask Questions

The Open Vision Computer: An Integrated Sensing and Compute System for Mobile Robots

Sep 20, 2018

Morgan Quigley, Kartik Mohta, Shreyas S. Shivakumar, Michael Watterson, Yash Mulgaonkar, Mikael Arguedas, Ke Sun, Sikang Liu, Bernd Pfrommer, Vijay Kumar(+1 more)

Figure 1 for The Open Vision Computer: An Integrated Sensing and Compute System for Mobile Robots

Figure 2 for The Open Vision Computer: An Integrated Sensing and Compute System for Mobile Robots

Figure 3 for The Open Vision Computer: An Integrated Sensing and Compute System for Mobile Robots

Figure 4 for The Open Vision Computer: An Integrated Sensing and Compute System for Mobile Robots

Abstract:In this paper we describe the Open Vision Computer (OVC) which was designed to support high speed, vision guided autonomous drone flight. In particular our aim was to develop a system that would be suitable for relatively small-scale flying platforms where size, weight, power consumption and computational performance were all important considerations. This manuscript describes the primary features of our OVC system and explains how they are used to support fully autonomous indoor and outdoor exploration and navigation operations on our Falcon 250 quadrotor platform.

* 7 pages, 13 figures, conference

Via

Access Paper or Ask Questions

U-Net for MAV-based Penstock Inspection: an Investigation of Focal Loss in Multi-class Segmentation for Corrosion Identification

Sep 18, 2018

Ty Nguyen, Tolga Ozaslan, Ian D. Miller, James Keller, Giuseppe Loianno, Camillo J. Taylor, Daniel D. Lee, Vijay Kumar, Joseph H. Harwood, Jennifer Wozencraft

Figure 1 for U-Net for MAV-based Penstock Inspection: an Investigation of Focal Loss in Multi-class Segmentation for Corrosion Identification

Figure 2 for U-Net for MAV-based Penstock Inspection: an Investigation of Focal Loss in Multi-class Segmentation for Corrosion Identification

Figure 3 for U-Net for MAV-based Penstock Inspection: an Investigation of Focal Loss in Multi-class Segmentation for Corrosion Identification

Figure 4 for U-Net for MAV-based Penstock Inspection: an Investigation of Focal Loss in Multi-class Segmentation for Corrosion Identification

Abstract:Periodical inspection and maintenance of critical infrastructure such as dams, penstocks, and locks are of significant importance to prevent catastrophic failures. Conventional manual inspection methods require inspectors to climb along a penstock to spot corrosion, rust and crack formation which is unsafe, labor-intensive, and requires intensive training. This work presents an alternative approach using a Micro Aerial Vehicle (MAV) that autonomously flies to collect imagery which is then fed into a pretrained deep-learning model to identify corrosion. Our simplified U-Net trained with less than 40 image samples can do inference at 12 fps on a single GPU. We analyze different loss functions to solve the class imbalance problem, followed by a discussion on choosing proper metrics and weights for object classes. Results obtained with the dataset collected from Center Hill Dam, TN show that focal loss function, combined with a proper set of class weights yield better segmentation results than the base loss, Softmax cross entropy. Our method can be used in combination with planning algorithm to offer a complete, safe and cost-efficient solution to autonomous infrastructure inspection.

* 8 Pages, 4 figures

Via

Access Paper or Ask Questions

Robust Fruit Counting: Combining Deep Learning, Tracking, and Structure from Motion

Aug 02, 2018

Xu Liu, Steven W. Chen, Shreyas Aditya, Nivedha Sivakumar, Sandeep Dcunha, Chao Qu, Camillo J. Taylor, Jnaneshwar Das, Vijay Kumar

Figure 1 for Robust Fruit Counting: Combining Deep Learning, Tracking, and Structure from Motion

Figure 2 for Robust Fruit Counting: Combining Deep Learning, Tracking, and Structure from Motion

Figure 3 for Robust Fruit Counting: Combining Deep Learning, Tracking, and Structure from Motion

Figure 4 for Robust Fruit Counting: Combining Deep Learning, Tracking, and Structure from Motion

Abstract:We present a novel fruit counting pipeline that combines deep segmentation, frame to frame tracking, and 3D localization to accurately count visible fruits across a sequence of images. Our pipeline works on image streams from a monocular camera, both in natural light, as well as with controlled illumination at night. We first train a Fully Convolutional Network (FCN) and segment video frame images into fruit and non-fruit pixels. We then track fruits across frames using the Hungarian Algorithm where the objective cost is determined from a Kalman Filter corrected Kanade-Lucas-Tomasi (KLT) Tracker. In order to correct the estimated count from tracking process, we combine tracking results with a Structure from Motion (SfM) algorithm to calculate relative 3D locations and size estimates to reject outliers and double counted fruit tracks. We evaluate our algorithm by comparing with ground-truth human-annotated visual counts. Our results demonstrate that our pipeline is able to accurately and reliably count fruits across image sequences, and the correction step can significantly improve the counting accuracy and robustness. Although discussed in the context of fruit counting, our work can extend to detection, tracking, and counting of a variety of other stationary features of interest such as leaf-spots, wilt, and blossom.

* Accepted in IROS 2018 (2018 IEEE/RSJ International Conference on Intelligent Robots and Systems)

Via

Access Paper or Ask Questions