Autonomous cars are self-driving vehicles that use artificial intelligence (AI) and sensors to navigate and operate without human intervention, using high-resolution cameras and lidars that detect what happens in the car's immediate surroundings. They have the potential to revolutionize transportation by improving safety, efficiency, and accessibility.
As automated vehicles enter public roads, safety in a near-infinite number of driving scenarios becomes one of the major concerns for the widespread adoption of fully autonomous driving. The ability to detect anomalous situations outside of the operational design domain is a key component in self-driving cars, enabling us to mitigate the impact of abnormal ego behaviors and to realize trustworthy driving systems. On-road anomaly detection in egocentric videos remains a challenging problem due to the difficulties introduced by complex and interactive scenarios. We conduct a holistic analysis of common on-road anomaly patterns, from which we propose three unsupervised anomaly detection experts: a scene expert that focuses on frame-level appearances to detect abnormal scenes and unexpected scene motions; an interaction expert that models normal relative motions between two road participants and raises alarms whenever anomalous interactions emerge; and a behavior expert which monitors abnormal behaviors of individual objects by future trajectory prediction. To combine the strengths of all the modules, we propose an expert ensemble (Xen) using a Kalman filter, in which the final anomaly score is absorbed as one of the states and the observations are generated by the experts. Our experiments employ a novel evaluation protocol for realistic model performance, demonstrate superior anomaly detection performance than previous methods, and show that our framework has potential in classifying anomaly types using unsupervised learning on a large-scale on-road anomaly dataset.




Self-driving cars relying solely on ego-centric perception face limitations in sensing, often failing to detect occluded, faraway objects. Collaborative autonomous driving (CAV) seems like a promising direction, but collecting data for development is non-trivial. It requires placing multiple sensor-equipped agents in a real-world driving scene, simultaneously! As such, existing datasets are limited in locations and agents. We introduce a novel surrogate to the rescue, which is to generate realistic perception from different viewpoints in a driving scene, conditioned on a real-world sample - the ego-car's sensory data. This surrogate has huge potential: it could potentially turn any ego-car dataset into a collaborative driving one to scale up the development of CAV. We present the very first solution, using a combination of simulated collaborative data and real ego-car data. Our method, Transfer Your Perspective (TYP), learns a conditioned diffusion model whose output samples are not only realistic but also consistent in both semantics and layouts with the given ego-car data. Empirical results demonstrate TYP's effectiveness in aiding in a CAV setting. In particular, TYP enables us to (pre-)train collaborative perception algorithms like early and late fusion with little or no real-world collaborative data, greatly facilitating downstream CAV applications.




The dispersion problem has received much attention recently in the distributed computing literature. In this problem, $k\leq n$ agents placed initially arbitrarily on the nodes of an $n$-node, $m$-edge anonymous graph of maximum degree $\Delta$ have to reposition autonomously to reach a configuration in which each agent is on a distinct node of the graph. Dispersion is interesting as well as important due to its connections to many fundamental coordination problems by mobile agents on graphs, such as exploration, scattering, load balancing, relocation of self-driven electric cars (robots) to recharge stations (nodes), etc. The objective has been to provide a solution that optimizes simultaneously time and memory complexities. There exist graphs for which the lower bound on time complexity is $\Omega(k)$. Memory complexity is $\Omega(\log k)$ per agent independent of graph topology. The state-of-the-art algorithms have (i) time complexity $O(k\log^2k)$ and memory complexity $O(\log(k+\Delta))$ under the synchronous setting [DISC'24] and (ii) time complexity $O(\min\{m,k\Delta\})$ and memory complexity $O(\log(k+\Delta))$ under the asynchronous setting [OPODIS'21]. In this paper, we improve substantially on this state-of-the-art. Under the synchronous setting as in [DISC'24], we present the first optimal $O(k)$ time algorithm keeping memory complexity $O(\log (k+\Delta))$. Under the asynchronous setting as in [OPODIS'21], we present the first algorithm with time complexity $O(k\log k)$ keeping memory complexity $O(\log (k+\Delta))$, which is time-optimal within an $O(\log k)$ factor despite asynchrony. Both results were obtained through novel techniques to quickly find empty nodes to settle agents, which may be of independent interest.




Simulators are indispensable for research in autonomous systems such as self-driving cars, autonomous robots and drones. Despite significant progress in various simulation aspects, such as graphical realism, an evident gap persists between the virtual and real-world environments. Since the ultimate goal is to deploy the autonomous systems in the real world, closing the sim2real gap is of utmost importance. In this paper, we employ a state-ofthe-art approach to enhance the photorealism of simulated data, aligning them with the visual characteristics of real-world datasets. Based on this, we developed CARLA2Real, an easy-to-use, publicly available tool (plug-in) for the widely used and open-source CARLA simulator. This tool enhances the output of CARLA in near realtime, achieving a frame rate of 13 FPS, translating it to the visual style and realism of real-world datasets such as Cityscapes, KITTI, and Mapillary Vistas. By employing the proposed tool, we generated synthetic datasets from both the simulator and the enhancement model outputs, including their corresponding ground truth annotations for tasks related to autonomous driving. Then, we performed a number of experiments to evaluate the impact of the proposed approach on feature extraction and semantic segmentation methods when trained on the enhanced synthetic data. The results demonstrate that the sim2real gap is significant and can indeed be reduced by the introduced approach.




Motion planning in uncertain environments like complex urban areas is a key challenge for autonomous vehicles (AVs). The aim of our research is to investigate how AVs can navigate crowded, unpredictable scenarios with multiple pedestrians while maintaining a safe and efficient vehicle behavior. So far, most research has concentrated on static or deterministic traffic participant behavior. This paper introduces a novel algorithm for motion planning in crowded spaces by combining social force principles for simulating realistic pedestrian behavior with a risk-aware motion planner. We evaluate this new algorithm in a 2D simulation environment to rigorously assess AV-pedestrian interactions, demonstrating that our algorithm enables safe, efficient, and adaptive motion planning, particularly in highly crowded urban environments - a first in achieving this level of performance. This study has not taken into consideration real-time constraints and has been shown only in simulation so far. Further studies are needed to investigate the novel algorithm in a complete software stack for AVs on real cars to investigate the entire perception, planning and control pipeline in crowded scenarios. We release the code developed in this research as an open-source resource for further studies and development. It can be accessed at the following link: https://github.com/TUM-AVS/PedestrianAwareMotionPlanning
Reliable state estimation is essential for autonomous systems operating in complex, noisy environments. Classical filtering approaches, such as the Kalman filter, can struggle when facing nonlinear dynamics or non-Gaussian noise, and even more flexible particle filters often encounter sample degeneracy or high computational costs in large-scale domains. Meanwhile, adaptive machine learning techniques, including Q-learning and neuroevolutionary algorithms such as NEAT, rely heavily on accurate state feedback to guide learning; when sensor data are imperfect, these methods suffer from degraded convergence and suboptimal performance. In this paper, we propose an integrated framework that unifies particle filtering with Q-learning and NEAT to explicitly address the challenge of noisy measurements. By refining radar-based observations into reliable state estimates, our particle filter drives more stable policy updates (in Q-learning) or controller evolution (in NEAT), allowing both reinforcement learning and neuroevolution to converge faster, achieve higher returns or fitness, and exhibit greater resilience to sensor uncertainty. Experiments on grid-based navigation and a simulated car environment highlight consistent gains in training stability, final performance, and success rates over baselines lacking advanced filtering. Altogether, these findings underscore that accurate state estimation is not merely a preprocessing step, but a vital component capable of substantially enhancing adaptive machine learning in real-world applications plagued by sensor noise.




Autonomous racing is gaining attention for its potential to advance autonomous vehicle technologies. Accurate race car dynamics modeling is essential for capturing and predicting future states like position, orientation, and velocity. However, accurately modeling complex subsystems such as tires and suspension poses significant challenges. In this paper, we introduce the Deep Kernel-based Multi-task Gaussian Process (DKMGP), which leverages the structure of a variational multi-task and multi-step Gaussian process model enhanced with deep kernel learning for vehicle dynamics modeling. Unlike existing single-step methods, DKMGP performs multi-step corrections with an adaptive correction horizon (ACH) algorithm that dynamically adjusts to varying driving conditions. To validate and evaluate the proposed DKMGP method, we compare the model performance with DKL-SKIP and a well-tuned single-track model, using high-speed dynamics data (exceeding 230kmph) collected from a full-scale Indy race car during the Indy Autonomous Challenge held at the Las Vegas Motor Speedway at CES 2024. The results demonstrate that DKMGP achieves upto 99% prediction accuracy compared to one-step DKL-SKIP, while improving real-time computational efficiency by 1752x. Our results show that DKMGP is a scalable and efficient solution for vehicle dynamics modeling making it suitable for high-speed autonomous racing control.




A multiagent sequential decision problem has been seen in many critical applications including urban transportation, autonomous driving cars, military operations, etc. Its widely known solution, namely multiagent reinforcement learning, has evolved tremendously in recent years. Among them, the solution paradigm of modeling other agents attracts our interest, which is different from traditional value decomposition or communication mechanisms. It enables agents to understand and anticipate others' behaviors and facilitates their collaboration. Inspired by recent research on the legibility that allows agents to reveal their intentions through their behavior, we propose a multiagent active legibility framework to improve their performance. The legibility-oriented framework allows agents to conduct legible actions so as to help others optimise their behaviors. In addition, we design a series of problem domains that emulate a common scenario and best characterize the legibility in multiagent reinforcement learning. The experimental results demonstrate that the new framework is more efficient and costs less training time compared to several multiagent reinforcement learning algorithms.




Trajectory prediction of agents is crucial for the safety of autonomous vehicles, whereas previous approaches usually rely on sufficiently long-observed trajectory to predict the future trajectory of the agents. However, in real-world scenarios, it is not realistic to collect adequate observed locations for moving agents, leading to the collapse of most prediction models. For instance, when a moving car suddenly appears and is very close to an autonomous vehicle because of the obstruction, it is quite necessary for the autonomous vehicle to quickly and accurately predict the future trajectories of the car with limited observed trajectory locations. In light of this, we focus on investigating the task of instantaneous trajectory prediction, i.e., two observed locations are available during inference. To this end, we propose a general and plug-and-play instantaneous trajectory prediction approach, called ITPNet. Specifically, we propose a backward forecasting mechanism to reversely predict the latent feature representations of unobserved historical trajectories of the agent based on its two observed locations and then leverage them as complementary information for future trajectory prediction. Meanwhile, due to the inevitable existence of noise and redundancy in the predicted latent feature representations, we further devise a Noise Redundancy Reduction Former, aiming at to filter out noise and redundancy from unobserved trajectories and integrate the filtered features and observed features into a compact query for future trajectory predictions. In essence, ITPNet can be naturally compatible with existing trajectory prediction models, enabling them to gracefully handle the case of instantaneous trajectory prediction. Extensive experiments on the Argoverse and nuScenes datasets demonstrate ITPNet outperforms the baselines, and its efficacy with different trajectory prediction models.




The paper presents a vision-based obstacle avoidance strategy for lightweight self-driving cars that can be run on a CPU-only device using a single RGB-D camera. The method consists of two steps: visual perception and path planning. The visual perception part uses ORBSLAM3 enhanced with optical flow to estimate the car's poses and extract rich texture information from the scene. In the path planning phase, we employ a method combining a control Lyapunov function and control barrier function in the form of quadratic program (CLF-CBF-QP) together with an obstacle shape reconstruction process (SRP) to plan safe and stable trajectories. To validate the performance and robustness of the proposed method, simulation experiments were conducted with a car in various complex indoor environments using the Gazebo simulation environment. Our method can effectively avoid obstacles in the scenes. The proposed algorithm outperforms benchmark algorithms in achieving more stable and shorter trajectories across multiple simulated scenes.