Abstract:Active Learning has proved to be a relevant approach to perform sample selection for training models for Autonomous Driving. Particularly, previous works on active learning for 3D object detection have shown that selection of samples in uncontrolled scenarios is challenging. Furthermore, current approaches focus exclusively on the theoretical aspects of the sample selection problem but neglect the practical insights that can be obtained from the extensive literature and application of 3D detection models. In this paper, we introduce HeAL (Heuristical-enhanced Active Learning for 3D Object Detection) which integrates those heuristical features together with Localization and Classification to deliver the most contributing samples to the model's training. In contrast to previous works, our approach integrates heuristical features such as object distance and point-quantity to estimate the uncertainty, which enhance the usefulness of selected samples to train detection models. Our quantitative evaluation on KITTI shows that HeAL presents competitive mAP with respect to the State-of-the-Art, and achieves the same mAP as the full-supervised baseline with only 24% of the samples.
Abstract:Deep learning models for object detection in autonomous driving have recently achieved impressive performance gains and are already being deployed in vehicles worldwide. However, current models require increasingly large datasets for training. Acquiring and labeling such data is costly, necessitating the development of new strategies to optimize this process. Active learning is a promising approach that has been extensively researched in the image domain. In our work, we extend this concept to the LiDAR domain by developing several inconsistency-based sample selection strategies and evaluate their effectiveness in various settings. Our results show that using a naive inconsistency approach based on the number of detected boxes, we achieve the same mAP as the random sampling strategy with 50% of the labeled data.
Abstract:Large Vision Language Models (LVLMs) have shown strong capabilities in understanding and analyzing visual scenes across various domains. However, in the context of autonomous driving, their limited comprehension of 3D environments restricts their effectiveness in achieving a complete and safe understanding of dynamic surroundings. To address this, we introduce V3LMA, a novel approach that enhances 3D scene understanding by integrating Large Language Models (LLMs) with LVLMs. V3LMA leverages textual descriptions generated from object detections and video inputs, significantly boosting performance without requiring fine-tuning. Through a dedicated preprocessing pipeline that extracts 3D object data, our method improves situational awareness and decision-making in complex traffic scenarios, achieving a score of 0.56 on the LingoQA benchmark. We further explore different fusion strategies and token combinations with the goal of advancing the interpretation of traffic scenes, ultimately enabling safer autonomous driving systems.
Abstract:This work presents a methodology to estimate tire parameters and their uncertainty using a Bayesian optimization approach. The literature mainly considers the estimation of tire parameters but lacks an evaluation of the parameter identification quality and the required slip ratios for an adequate model fit. Therefore, we examine the use of Stochastical Variational Inference as a methodology to estimate both - the parameters and their uncertainties. We evaluate the method compared to a state-of-the-art Nelder-Mead algorithm for theoretical and real-world application. The theoretical study considers parameter fitting at different slip ratios to evaluate the required excitation for an adequate fitting of each parameter. The results are compared to a sensitivity analysis for a Pacejka Magic Formula tire model. We show the application of the algorithm on real-world data acquired during the Abu Dhabi Autonomous Racing League and highlight the uncertainties in identifying the curvature and shape parameters due to insufficient excitation. The gathered insights can help assess the acquired data's limitations and instead utilize standardized parameters until higher slip ratios are captured. We show that our proposed method can be used to assess the mean values and the uncertainties of tire model parameters in real-world conditions and derive actions for the tire modeling based on our simulative study.
Abstract:Usually, a controller for path- or trajectory tracking is employed in autonomous driving. Typically, these controllers generate high-level commands like longitudinal acceleration or force. However, vehicles with combustion engines expect different actuation inputs. This paper proposes a longitudinal control concept that translates high-level trajectory-tracking commands to the required low-level vehicle commands such as throttle, brake pressure and a desired gear. We chose a modular structure to easily integrate different trajectory-tracking control algorithms and vehicles. The proposed control concept enables a close tracking of the high-level control command. An anti-lock braking system, traction control, and brake warmup control also ensure a safe operation during real-world tests. We provide experimental validation of our concept using real world data with longitudinal accelerations reaching up to $25 \, \frac{\mathrm{m}}{\mathrm{s}^2}$. The experiments were conducted using the EAV24 racecar during the first event of the Abu Dhabi Autonomous Racing League on the Yas Marina Formula 1 Circuit.
Abstract:Autonomous driving is a complex undertaking. A common approach is to break down the driving task into individual subtasks through modularization. These sub-modules are usually developed and published separately. However, if these individually developed algorithms have to be combined again to form a full-stack autonomous driving software, this poses particular challenges. Drawing upon our practical experience in developing the software of TUM Autonomous Motorsport, we have identified and derived these challenges in developing an autonomous driving software stack within a scientific environment. We do not focus on the specific challenges of individual algorithms but on the general difficulties that arise when deploying research algorithms on real-world test vehicles. To overcome these challenges, we introduce strategies that have been effective in our development approach. We additionally provide open-source implementations that enable these concepts on GitHub. As a result, this paper's contributions will simplify future full-stack autonomous driving projects, which are essential for a thorough evaluation of the individual algorithms.
Abstract:Conventional trajectory planning approaches for autonomous vehicles often assume a fixed vehicle model that remains constant regardless of the vehicle's location. This overlooks the critical fact that the tires and the surface are the two force-transmitting partners in vehicle dynamics; while the tires stay with the vehicle, surface conditions vary with location. Recognizing these challenges, this paper presents a novel framework for spatially resolving dynamic constraints in both offline and online planning algorithms applied to autonomous racing. We introduce the GripMap concept, which provides a spatial resolution of vehicle dynamic constraints in the Frenet frame, allowing adaptation to locally varying grip conditions. This enables compensation for location-specific effects, more efficient vehicle behavior, and increased safety, unattainable with spatially invariant vehicle models. The focus is on low storage demand and quick access through perfect hashing. This framework proved advantageous in real-world applications in the presented form. Experiments inspired by autonomous racing demonstrate its effectiveness. In future work, this framework can serve as a foundational layer for developing future interpretable learning algorithms that adjust to varying grip conditions in real-time.
Abstract:In autonomous systems, sensor calibration is essential for a safe and efficient navigation in dynamic environments. Accurate calibration is a prerequisite for reliable perception and planning tasks such as object detection and obstacle avoidance. Many existing LiDAR calibration methods require overlapping fields of view, while others use external sensing devices or postulate a feature-rich environment. In addition, Sensor-to-Vehicle calibration is not supported by the vast majority of calibration algorithms. In this work, we propose a novel target-based technique for extrinsic Sensor-to-Sensor and Sensor-to-Vehicle calibration of multi-LiDAR systems called CaLiV. This algorithm works for non-overlapping FoVs, as well as arbitrary calibration targets, and does not require any external sensing devices. First, we apply motion to produce FoV overlaps and utilize a simple unscented Kalman filter to obtain vehicle poses. Then, we use the Gaussian mixture model-based registration framework GMMCalib to align the point clouds in a common calibration frame. Finally, we reduce the task of recovering the sensor extrinsics to a minimization problem. We show that both translational and rotational Sensor-to-Sensor errors can be solved accurately by our method. In addition, all Sensor-to-Vehicle rotation angles can also be calibrated with high accuracy. We validate the simulation results in real-world experiments. The code is open source and available on https://github.com/TUMFTM/CaLiV.
Abstract:The goal of extrinsic calibration is the alignment of sensor data to ensure an accurate representation of the surroundings and enable sensor fusion applications. From a safety perspective, sensor calibration is a key enabler of autonomous driving. In the current state of the art, a trend from target-based offline calibration towards targetless online calibration can be observed. However, online calibration is subject to strict real-time and resource constraints which are not met by state-of-the-art methods. This is mainly due to the high number of parameters to estimate, the reliance on geometric features, or the dependence on specific vehicle maneuvers. To meet these requirements and ensure the vehicle's safety at any time, we propose a miscalibration detection framework that shifts the focus from the direct regression of calibration parameters to a binary classification of the calibration state, i.e., calibrated or miscalibrated. Therefore, we propose a contrastive learning approach that compares embedded features in a latent space to classify the calibration state of two different sensor modalities. Moreover, we provide a comprehensive analysis of the feature embeddings and challenging calibration errors that highlight the performance of our approach. As a result, our method outperforms the current state-of-the-art in terms of detection performance, inference time, and resource demand. The code is open source and available on https://github.com/TUMFTM/MiscalibrationDetection.
Abstract:Deep learning models for autonomous driving, encompassing perception, planning, and control, depend on vast datasets to achieve their high performance. However, their generalization often suffers due to domain-specific data distributions, making an effective scene-based categorization of samples necessary to improve their reliability across diverse domains. Manual captioning, though valuable, is both labor-intensive and time-consuming, creating a bottleneck in the data annotation process. Large Visual Language Models (LVLMs) present a compelling solution by automating image analysis and categorization through contextual queries, often without requiring retraining for new categories. In this study, we evaluate the capabilities of LVLMs, including GPT-4 and LLaVA, to understand and classify urban traffic scenes on both an in-house dataset and the BDD100K. We propose a scalable captioning pipeline that integrates state-of-the-art models, enabling a flexible deployment on new datasets. Our analysis, combining quantitative metrics with qualitative insights, demonstrates the effectiveness of LVLMs to understand urban traffic scenarios and highlights their potential as an efficient tool for data-driven advancements in autonomous driving.