Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"autonomous cars": models, code, and papers

Separable Convolutional LSTMs for Faster Video Segmentation

Jul 16, 2019
Andreas Pfeuffer, Klaus Dietmayer

Semantic Segmentation is an important module for autonomous robots such as self-driving cars. The advantage of video segmentation approaches compared to single image segmentation is that temporal image information is considered, and their performance increases due to this. Hence, single image segmentation approaches are extended by recurrent units such as convolutional LSTM (convLSTM) cells, which are placed at suitable positions in the basic network architecture. However, a major critique of video segmentation approaches based on recurrent neural networks is their large parameter count and their computational complexity, and so, their inference time of one video frame takes up to 66 percent longer than their basic version. Inspired by the success of the spatial and depthwise separable convolutional neural networks, we generalize these techniques for convLSTMs in this work, so that the number of parameters and the required FLOPs are reduced significantly. Experiments on different datasets show that the segmentation approaches using the proposed, modified convLSTM cells achieve similar or slightly worse accuracy, but are up to 15 percent faster on a GPU than the ones using the standard convLSTM cells. Furthermore, a new evaluation metric is introduced, which measures the amount of flickering pixels in the segmented video sequence.

* 2019 22st International Conference on Intelligent Transportation Systems (ITSC) 
Access Paper or Ask Questions

NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

Oct 18, 2021
Diwei Sheng, Yuxiang Chai, Xinru Li, Chen Feng, Jianzhe Lin, Claudio Silva, John-Ross Rizzo

Visual place recognition (VPR) is critical in not only localization and mapping for autonomous driving vehicles, but also assistive navigation for the visually impaired population. To enable a long-term VPR system on a large scale, several challenges need to be addressed. First, different applications could require different image view directions, such as front views for self-driving cars while side views for the low vision people. Second, VPR in metropolitan scenes can often cause privacy concerns due to the imaging of pedestrian and vehicle identity information, calling for the need for data anonymization before VPR queries and database construction. Both factors could lead to VPR performance variations that are not well understood yet. To study their influences, we present the NYU-VPR dataset that contains more than 200,000 images over a 2km by 2km area near the New York University campus, taken within the whole year of 2016. We present benchmark results on several popular VPR algorithms showing that side views are significantly more challenging for current VPR methods while the influence of data anonymization is almost negligible, together with our hypothetical explanations and in-depth analysis.

* 7 pages, 10 figures, published in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021) 
Access Paper or Ask Questions

Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds

Nov 06, 2020
Yara Ali Alnaggar, Mohamed Afifi, Karim Amer, Mohamed Elhelw

Semantic segmentation of 3D point cloud data is essential for enhanced high-level perception in autonomous platforms. Furthermore, given the increasing deployment of LiDAR sensors onboard of cars and drones, a special emphasis is also placed on non-computationally intensive algorithms that operate on mobile GPUs. Previous efficient state-of-the-art methods relied on 2D spherical projection of point clouds as input for 2D fully convolutional neural networks to balance the accuracy-speed trade-off. This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud to mitigate the loss of information inherent in single projection methods. Our Multi-Projection Fusion (MPF) framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models then combines the segmentation results of both views. The proposed framework is validated on the SemanticKITTI dataset where it achieved a mIoU of 55.5 which is higher than state-of-the-art projection-based methods RangeNet++ and PolarNet while being 1.6x faster than the former and 3.1x faster than the latter.

* Accepted at the 2021 Winter Conference on Applications of Computer Vision (WACV 2021) 
Access Paper or Ask Questions

Don't Forget The Past: Recurrent Depth Estimation from Monocular Video

Jan 08, 2020
Vaishakh Patil, Wouter Van Gansbeke, Dengxin Dai, Luc Van Gool

Autonomous cars need continuously updated depth information. Thus far, the depth is mostly estimated independently for a single frame at a time, even if the method starts from video input. Our method produces a time series of depth maps, which makes it an ideal candidate for online learning approaches. In particular, we put three different types of depth estimation (supervised depth prediction, self-supervised depth prediction, and self-supervised depth completion) into a common framework. We integrate the corresponding networks with a convolutional LSTM such that the spatiotemporal structures of depth across frames can be exploited to yield a more accurate depth estimation. Our method is flexible. It can be applied to monocular videos only or be combined with different types of sparse depth patterns. We carefully study the architecture of the recurrent network and its training strategy. We are first to successfully exploit recurrent networks for real-time self-supervised monocular depth estimation and completion. Extensive experiments show that our recurrent method outperforms its image-based counterpart consistently and significantly in both self-supervised scenarios. It also outperforms previous depth estimation methods of the three popular groups.

Access Paper or Ask Questions

Detecting Adversarial Samples for Deep Neural Networks through Mutation Testing

May 17, 2018
Jingyi Wang, Jun Sun, Peixin Zhang, Xinyu Wang

Recently, it has been shown that deep neural networks (DNN) are subject to attacks through adversarial samples. Adversarial samples are often crafted through adversarial perturbation, i.e., manipulating the original sample with minor modifications so that the DNN model labels the sample incorrectly. Given that it is almost impossible to train perfect DNN, adversarial samples are shown to be easy to generate. As DNN are increasingly used in safety-critical systems like autonomous cars, it is crucial to develop techniques for defending such attacks. Existing defense mechanisms which aim to make adversarial perturbation challenging have been shown to be ineffective. In this work, we propose an alternative approach. We first observe that adversarial samples are much more sensitive to perturbations than normal samples. That is, if we impose random perturbations on a normal and an adversarial sample respectively, there is a significant difference between the ratio of label change due to the perturbations. Observing this, we design a statistical adversary detection algorithm called nMutant (inspired by mutation testing from software engineering community). Our experiments show that nMutant effectively detects most of the adversarial samples generated by recently proposed attacking methods. Furthermore, we provide an error bound with certain statistical significance along with the detection.

* Sumitted to NIPS 2018 
Access Paper or Ask Questions

JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services

Jan 25, 2018
Amir Erfan Eshratifar, Mohammad Saeed Abrishami, Massoud Pedram

Deep neural networks are among the most influential architectures of deep learning algorithms, being deployed in many mobile intelligent applications. End-side services, such as intelligent personal assistants (IPAs), autonomous cars, and smart home services often employ either simple local models or complex remote models on the cloud. Mobile-only and cloud-only computations are currently the status quo approaches. In this paper, we propose an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side, but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloud-only approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward and backward propagation in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18X and 32X reductions on the latency and mobile energy consumption of querying DNNs, respectively.

Access Paper or Ask Questions

Deep Interference Mitigation and Denoising of Real-World FMCW Radar Signals

Dec 04, 2020
Johanna Rock, Mate Toth, Paul Meissner, Franz Pernkopf

Radar sensors are crucial for environment perception of driver assistance systems as well as autonomous cars. Key performance factors are a fine range resolution and the possibility to directly measure velocity. With a rising number of radar sensors and the so far unregulated automotive radar frequency band, mutual interference is inevitable and must be dealt with. Sensors must be capable of detecting, or even mitigating the harmful effects of interference, which include a decreased detection sensitivity. In this paper, we evaluate a Convolutional Neural Network (CNN)-based approach for interference mitigation on real-world radar measurements. We combine real measurements with simulated interference in order to create input-output data suitable for training the model. We analyze the performance to model complexity relation on simulated and measurement data, based on an extensive parameter search. Further, a finite sample size performance comparison shows the effectiveness of the model trained on either simulated or real data as well as for transfer learning. A comparative performance analysis with the state of the art emphasizes the potential of CNN-based models for interference mitigation and denoising of real-world measurements, also considering resource constraints of the hardware.

* 2020 IEEE International Radar Conference (RADAR) 
Access Paper or Ask Questions

Complex Signal Denoising and Interference Mitigation for Automotive Radar Using Convolutional Neural Networks

Jun 25, 2019
Johanna Rock, Mate Toth, Elmar Messner, Paul Meissner, Franz Pernkopf

Driver assistance systems as well as autonomous cars have to rely on sensors to perceive their environment. A heterogeneous set of sensors is used to perform this task robustly. Among them, radar sensors are indispensable because of their range resolution and the possibility to directly measure velocity. Since more and more radar sensors are deployed on the streets, mutual interference must be dealt with. In the so far unregulated automotive radar frequency band, a sensor must be capable of detecting, or even mitigating the harmful effects of interference, which include a decreased detection sensitivity. In this paper, we address this issue with Convolutional Neural Networks (CNNs), which are state-of-the-art machine learning tools. We show that the ability of CNNs to find structured information in data while preserving local information enables superior denoising performance. To achieve this, CNN parameters are found using training with simulated data and integrated into the automotive radar signal processing chain. The presented method is compared with the state of the art, highlighting its promising performance. Hence, CNNs can be employed for interference mitigation as an alternative to conventional signal processing methods. Code and pre-trained models are available at

* FUSION 2019; 8 pages 
Access Paper or Ask Questions

KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D

Sep 28, 2021
Yiyi Liao, Jun Xie, Andreas Geiger

For the last few decades, several major subfields of artificial intelligence including computer vision, graphics, and robotics have progressed largely independently from each other. Recently, however, the community has realized that progress towards robust intelligent systems such as self-driving cars requires a concerted effort across the different fields. This motivated us to develop KITTI-360, successor of the popular KITTI dataset. KITTI-360 is a suburban driving dataset which comprises richer input modalities, comprehensive semantic instance annotations and accurate localization to facilitate research at the intersection of vision, graphics and robotics. For efficient annotation, we created a tool to label 3D scenes with bounding primitives and developed a model that transfers this information into the 2D image domain, resulting in over 150k semantic and instance annotated images and 1B annotated 3D points. Moreover, we established benchmarks and baselines for several tasks relevant to mobile perception, encompassing problems from computer vision, graphics, and robotics on the same dataset. KITTI-360 will enable progress at the intersection of these research areas and thus contributing towards solving one of our grand challenges: the development of fully autonomous self-driving systems.

* arXiv admin note: text overlap with arXiv:1511.03240 
Access Paper or Ask Questions