What is autonomous cars? Autonomous cars are self-driving vehicles that use artificial intelligence (AI) and sensors to navigate and operate without human intervention, using high-resolution cameras and lidars that detect what happens in the car's immediate surroundings. They have the potential to revolutionize transportation by improving safety, efficiency, and accessibility.
Papers and Code
Jun 17, 2024
Abstract:Vehicle trajectory prediction is crucial for advancing autonomous driving and advanced driver assistance systems (ADAS), enhancing road safety and traffic efficiency. While traditional methods have laid foundational work, modern deep learning techniques, particularly transformer-based models and generative approaches, have significantly improved prediction accuracy by capturing complex and non-linear patterns in vehicle motion and traffic interactions. However, these models often overlook the detailed car-following behaviors and inter-vehicle interactions essential for real-world driving scenarios. This study introduces a Cross-Attention Transformer Enhanced Conditional Diffusion Model (Crossfusor) specifically designed for car-following trajectory prediction. Crossfusor integrates detailed inter-vehicular interactions and car-following dynamics into a robust diffusion framework, improving both the accuracy and realism of predicted trajectories. The model leverages a novel temporal feature encoding framework combining GRU, location-based attention mechanisms, and Fourier embedding to capture historical vehicle dynamics. It employs noise scaled by these encoded historical features in the forward diffusion process, and uses a cross-attention transformer to model intricate inter-vehicle dependencies in the reverse denoising process. Experimental results on the NGSIM dataset demonstrate that Crossfusor outperforms state-of-the-art models, particularly in long-term predictions, showcasing its potential for enhancing the predictive capabilities of autonomous driving systems.
Via

May 27, 2024
Abstract:Automotive radar emerges as a crucial sensor for autonomous vehicle perception. As more cars are equipped radars, radar interference is an unavoidable challenge. Unlike conventional approaches such as interference mitigation and interference-avoiding technologies, this paper introduces an innovative collaborative sensing scheme with multiple automotive radars that exploits constructive interference. Through collaborative sensing, our method optimally aligns cross-path interference signals from other radars with another radar's self-echo signals, thereby significantly augmenting its target detection capabilities. This approach alleviates the need for extensive raw data sharing between collaborating radars. Instead, only an optimized weighting matrix needs to be exchanged between the radars. This approach considerably decreases the data bandwidth requirements for the wireless channel, making it a more feasible and practical solution for automotive radar collaboration. Numerical results demonstrate the effectiveness of the constructive interference approach for enhanced object detection capability.
* paper accepted by IEEE SAM Workshop 2024
Via

Jun 23, 2024
Abstract:In the realm of driving technologies, fully autonomous vehicles have not been widely adopted yet, making advanced driver assistance systems (ADAS) crucial for enhancing driving experiences. Adaptive Cruise Control (ACC) emerges as a pivotal component of ADAS. However, current ACC systems often employ fixed settings, failing to intuitively capture drivers' social preferences and leading to potential function disengagement. To overcome these limitations, we propose the Editable Behavior Generation (EBG) model, a data-driven car-following model that allows for adjusting driving discourtesy levels. The framework integrates diverse courtesy calculation methods into long short-term memory (LSTM) and Transformer architectures, offering a comprehensive approach to capture nuanced driving dynamics. By integrating various discourtesy values during the training process, our model generates realistic agent trajectories with different levels of courtesy in car-following behavior. Experimental results on the HighD and Waymo datasets showcase a reduction in Mean Squared Error (MSE) of spacing and MSE of speed compared to baselines, establishing style controllability. To the best of our knowledge, this work represents the first data-driven car-following model capable of dynamically adjusting discourtesy levels. Our model provides valuable insights for the development of ACC systems that take into account drivers' social preferences.
Via

Jun 18, 2024
Abstract:As autonomous vehicles continue to revolutionize transportation, addressing challenges posed by adverse weather conditions, particularly during winter, becomes paramount for ensuring safe and efficient operations. One of the most important aspects of a road safety inspection during adverse weather is when a limited lane width can reduce the capacity of the road and raise the risk of serious accidents involving autonomous vehicles. In this research, a method for improving driving challenges on roads in winter conditions, with a model that segments and estimates the width of the road from the perspectives of Uncrewed aerial vehicles and autonomous vehicles. The proposed approach in this article is needed to empower self-driving cars with up-to-date and accurate insights, enhancing their adaptability and decision-making capabilities in winter landscapes.
Via

Aug 02, 2024
Abstract:LiDAR-based sensors employing optical spectrum signals play a vital role in providing significant information about the target objects in autonomous driving vehicle systems. However, the presence of fog in the atmosphere severely degrades the overall system's performance. This manuscript analyzes the role of fog particle size distributions in 3D object detection under adverse weather conditions. We utilise Mie theory and meteorological optical range (MOR) to calculate the attenuation and backscattering coefficient values for point cloud generation and analyze the overall system's accuracy in Car, Cyclist, and Pedestrian case scenarios under easy, medium and hard detection difficulties. Gamma and Junge (Power-Law) distributions are employed to mathematically model the fog particle size distribution under strong and moderate advection fog environments. Subsequently, we modified the KITTI dataset based on the backscattering coefficient values and trained it on the PV-RCNN++ deep neural network model for Car, Cyclist, and Pedestrian cases under different detection difficulties. The result analysis shows a significant variation in the system's accuracy concerning the changes in target object dimensionality, the nature of the fog environment and increasing detection difficulties, with the Car exhibiting the highest accuracy of around 99% and the Pedestrian showing the lowest accuracy of around 73%.
Via

Aug 27, 2024
Abstract:Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuitively, the data from the drone's perspective can provide a complementary viewpoint for the data from the ground vehicle's perspective, enhancing the completeness of scene reconstruction and rendering. However, training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and does not demonstrate remarkable improvements in performance on road views. In order to enhance the novel view synthesis of road views and to effectively use the aerial information, we design an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did. We are the first to introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images, weighting the contribution of each pixel to the training process. Additionally, to systematically quantify evaluation metrics, we assemble a high-quality synthesized dataset comprising both aerial and ground images for road scenes.
Via

Jul 10, 2024
Abstract:Generating 3D vehicle assets from in-the-wild observations is crucial to autonomous driving. Existing image-to-3D methods cannot well address this problem because they learn generation merely from image RGB information without a deeper understanding of in-the-wild vehicles (such as car models, manufacturers, etc.). This leads to their poor zero-shot prediction capability to handle real-world observations with occlusion or tricky viewing angles. To solve this problem, in this work, we propose VQA-Diff, a novel framework that leverages in-the-wild vehicle images to create photorealistic 3D vehicle assets for autonomous driving. VQA-Diff exploits the real-world knowledge inherited from the Large Language Model in the Visual Question Answering (VQA) model for robust zero-shot prediction and the rich image prior knowledge in the Diffusion model for structure and appearance generation. In particular, we utilize a multi-expert Diffusion Models strategy to generate the structure information and employ a subject-driven structure-controlled generation mechanism to model appearance information. As a result, without the necessity to learn from a large-scale image-to-3D vehicle dataset collected from the real world, VQA-Diff still has a robust zero-shot image-to-novel-view generation ability. We conduct experiments on various datasets, including Pascal 3D+, Waymo, and Objaverse, to demonstrate that VQA-Diff outperforms existing state-of-the-art methods both qualitatively and quantitatively.
Via

Aug 22, 2024
Abstract:Social norm is defined as a shared standard of acceptable behavior in a society. The emergence of social norms fosters coordination among agents without any hard-coded rules, which is crucial for the large-scale deployment of AVs in an intelligent transportation system. This paper explores the application of LLMs in understanding and modeling social norms in autonomous driving games. We introduce LLMs into autonomous driving games as intelligent agents who make decisions according to text prompts. These agents are referred to as LLM-based agents. Our framework involves LLM-based agents playing Markov games in a multi-agent system (MAS), allowing us to investigate the emergence of social norms among individual agents. We aim to identify social norms by designing prompts and utilizing LLMs on textual information related to the environment setup and the observations of LLM-based agents. Using the OpenAI Chat API powered by GPT-4.0, we conduct experiments to simulate interactions and evaluate the performance of LLM-based agents in two driving scenarios: unsignalized intersection and highway platoon. The results show that LLM-based agents can handle dynamically changing environments in Markov games, and social norms evolve among LLM-based agents in both scenarios. In the intersection game, LLM-based agents tend to adopt a conservative driving policy when facing a potential car crash. The advantage of LLM-based agents in games lies in their strong operability and analyzability, which facilitate experimental design.
Via

Jun 12, 2024
Abstract:A fundamental task in mobile robotics is keeping an intelligent agent under surveillance with an autonomous robot as it travels in the environment. This work studies a version of that problem involving one of the most popular vehicle platforms in robotics. In particular, we consider two identical Dubins cars moving on a plane without obstacles. One of them plays as the pursuer, and it is equipped with a limited field-of-view detection region modeled as a semi-infinite cone with its apex at the pursuer's position. The pursuer aims to maintain the other Dubins car, which plays as the evader, as much time as possible inside its detection region. On the contrary, the evader wants to escape as soon as possible. In this work, employing differential game theory, we find the time-optimal motion strategies near the game's end. The analysis of those trajectories reveals the existence of at least two singular surfaces: a Transition Surface and an Evader's Universal Surface. We also found that the barrier's standard construction produces a surface that partially lies outside the playing space and fails to define a closed region, implying that an additional procedure is required to determine all configurations where the evader escapes.
* 8 pages, 9 figures
Via

May 28, 2024
Abstract:Unveiling an empty street from crowded observations captured by in-car cameras is crucial for autonomous driving. However, removing all temporary static objects, such as stopped vehicles and standing pedestrians, presents a significant challenge. Unlike object-centric 3D inpainting, which relies on thorough observation in a small scene, street scenes involve long trajectories that differ from previous 3D inpainting tasks. The camera-centric moving environment of captured videos further complicates the task due to the limited degree and time duration of object observation. To address these obstacles, we introduce StreetUnveiler to reconstruct an empty street. StreetUnveiler learns a 3D representation of the empty street from crowded observations. Our representation is based on the hard-label semantic 2D Gaussian Splatting (2DGS) for its scalability and ability to identify Gaussians to be removed. We inpaint rendered image after removing unwanted Gaussians to provide pseudo-labels and subsequently re-optimize the 2DGS. Given its temporal continuous movement, we divide the empty street scene into observed, partial-observed, and unobserved regions, which we propose to locate through a rendered alpha map. This decomposition helps us to minimize the regions that need to be inpainted. To enhance the temporal consistency of the inpainting, we introduce a novel time-reversal framework to inpaint frames in reverse order and use later frames as references for earlier frames to fully utilize the long-trajectory observations. Our experiments conducted on the street scene dataset successfully reconstructed a 3D representation of the empty street. The mesh representation of the empty street can be extracted for further applications. Project page and more visualizations can be found at: https://streetunveiler.github.io
Via
