The availability of real-world datasets is the prerequisite to develop object detection methods for autonomous driving. While ambiguity exists in object labels due to error-prone annotation process or sensor observation noises, current object detection datasets only provide deterministic annotations, without considering their uncertainty. This precludes an in-depth evaluation among different object detection methods, especially for those that explicitly model predictive probability. In this work, we propose a generative model to estimate bounding box label uncertainties from LiDAR point clouds, and define a new representation of the probabilistic bounding box through spatial distribution. Comprehensive experiments show that the proposed model represents uncertainties commonly seen in driving scenarios. Based on the spatial distribution, we further propose an extension of IoU, called the Jaccard IoU (JIoU), as a new evaluation metric that incorporates label uncertainty. The experiments on the KITTI and the Waymo Open Datasets show that JIoU is superior to IoU when evaluating probabilistic object detectors.
This work presents a probabilistic deep neural network that combines LiDAR point clouds and RGB camera images for robust, accurate 3D object detection. We explicitly model uncertainties in the classification and regression tasks, and leverage uncertainties to train the fusion network via a sampling mechanism. We validate our method on three datasets with challenging real-world driving scenarios. Experimental results show that the predicted uncertainties reflect complex environmental uncertainty like difficulties of a human expert to label objects. The results also show that our method consistently improves the Average Precision by up to 7% compared to the baseline method. When sensors are temporally misaligned, the sampling method improves the Average Precision by up to 20%, showing its high robustness against noisy sensor inputs.
Reliable uncertainty estimation is crucial for perception systems in safe autonomous driving. Recently, many methods have been proposed to model uncertainties in deep learning based object detectors. However, the estimated probabilities are often uncalibrated, which may lead to severe problems in safety critical scenarios. In this work, we identify such uncertainty miscalibration problems in a probabilistic LiDAR 3D object detection network, and propose three practical methods to significantly reduce errors in uncertainty calibration. Extensive experiments on several datasets show that our methods produce well-calibrated uncertainties, and generalize well between different datasets.
Recent advancements in the perception for autonomous driving are driven by deep learning. In order to achieve the robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of "what to fuse", "when to fuse", and "how to fuse" remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets and the background information of object detection and semantic segmentation for the autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://multimodalperception.github.io.
We present a robust real-time LiDAR 3D object detector that leverages heteroscedastic aleatoric uncertainties to significantly improve its detection performance. A multi-loss function is designed to incorporate uncertainty estimations predicted by auxiliary output layers. Using our proposed method, the network ignores to train from noisy samples, and focuses more on informative ones. We validate our method on the KITTI object detection benchmark. Our method surpasses the baseline method which does not explicitly estimate uncertainties by up to nearly 9% in terms of Average Precision (AP). It also produces state-of-the-art results compared to other methods while running with an inference time of only 72 ms. In addition, we conduct extensive experiments to understand how aleatoric uncertainties behave. Extracting aleatoric uncertainties brings almost no additional computation cost during the deployment, making our method highly desirable for autonomous driving applications.
Training a deep object detector for autonomous driving requires a huge amount of labeled data. While recording data via on-board sensors such as camera or LiDAR is relatively easy, annotating data is very tedious and time-consuming, especially when dealing with 3D LiDAR points or radar data. Active learning has the potential to minimize human annotation efforts while maximizing the object detector's performance. In this work, we propose an active learning method to train a LiDAR 3D object detector with the least amount of labeled training data necessary. The detector leverages 2D region proposals generated from the RGB images to reduce the search space of objects and speed up the learning process. Experiments show that our proposed method works under different uncertainty estimations and query functions, and can save up to 60% of the labeling efforts while reaching the same network performance.
To assure that an autonomous car is driving safely on public roads, its object detection module should not only work correctly, but show its prediction confidence as well. Previous object detectors driven by deep learning do not explicitly model uncertainties in the neural network. We tackle with this problem by presenting practical methods to capture uncertainties in a 3D vehicle detector for Lidar point clouds. The proposed probabilistic detector represents reliable epistemic uncertainty and aleatoric uncertainty in classification and localization tasks. Experimental results show that the epistemic uncertainty is related to the detection accuracy, whereas the aleatoric uncertainty is influenced by vehicle distance and occlusion. The results also show that we can improve the detection performance by 1%-5% by modeling the aleatoric uncertainty.
Reusing the tactile knowledge of some previously-explored objects helps us humans to easily recognize the tactual properties of new objects. In this master thesis, we enable arobotic arm equipped with multi-modal artificial skin, like humans, to actively transfer the prior tactile exploratory action experiences when it learns the detailed physical properties of new objects. These prior tactile experiences are built when the robot applies the pressing, sliding and static contact movements on objects with different action parameters and perceives the tactile feedbacks from multiple sensory modalities. Our method was systematically evaluated by several experiments. Results show that the robot could consistently improve the discrimination accuracy by over 10% when it exploited the prior tactile knowledge compared with using no transfer method, and 25% when it used only one training sample. The results also show that the proposed method was robust against transferring irrelevant prior tactile knowledge.