Autonomous cars are self-driving vehicles that use artificial intelligence (AI) and sensors to navigate and operate without human intervention, using high-resolution cameras and lidars that detect what happens in the car's immediate surroundings. They have the potential to revolutionize transportation by improving safety, efficiency, and accessibility.




Environment perception is a fundamental part of the dynamic driving task executed by Autonomous Driving Systems (ADS). Artificial Intelligence (AI)-based approaches have prevailed over classical techniques for realizing the environment perception. Current safety-relevant standards for automotive systems, International Organization for Standardization (ISO) 26262 and ISO 21448, assume the existence of comprehensive requirements specifications. These specifications serve as the basis on which the functionality of an automotive system can be rigorously tested and checked for compliance with safety regulations. However, AI-based perception systems do not have complete requirements specification. Instead, large datasets are used to train AI-based perception systems. This paper presents a function monitor for the functional runtime monitoring of a two-folded AI-based environment perception for ADS, based respectively on camera and LiDAR sensors. To evaluate the applicability of the function monitor, we conduct a qualitative scenario-based evaluation in a controlled laboratory environment using a model car. The evaluation results then are discussed to provide insights into the monitor's performance and its suitability for real-world applications.




Autonomous racing has rapidly gained research attention. Traditionally, racing cars rely on 2D LiDAR as their primary visual system. In this work, we explore the integration of an event camera with the existing system to provide enhanced temporal information. Our goal is to fuse the 2D LiDAR data with event data in an end-to-end learning framework for steering prediction, which is crucial for autonomous racing. To the best of our knowledge, this is the first study addressing this challenging research topic. We start by creating a multisensor dataset specifically for steering prediction. Using this dataset, we establish a benchmark by evaluating various SOTA fusion methods. Our observations reveal that existing methods often incur substantial computational costs. To address this, we apply low-rank techniques to propose a novel, efficient, and effective fusion design. We introduce a new fusion learning policy to guide the fusion process, enhancing robustness against misalignment. Our fusion architecture provides better steering prediction than LiDAR alone, significantly reducing the RMSE from 7.72 to 1.28. Compared to the second-best fusion method, our work represents only 11% of the learnable parameters while achieving better accuracy. The source code, dataset, and benchmark will be released to promote future research.




Car-following (CF) modeling, a fundamental component in microscopic traffic simulation, has attracted increasing interest of researchers in the past decades. In this study, we propose an adaptable personalized car-following framework -MetaFollower, by leveraging the power of meta-learning. Specifically, we first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events. Afterward, the pre-trained model can be fine-tuned on new drivers with only a few CF trajectories to achieve personalized CF adaptation. We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability. Unlike conventional adaptive cruise control (ACC) systems that rely on predefined settings and constant parameters without considering heterogeneous driving characteristics, MetaFollower can accurately capture and simulate the intricate dynamics of car-following behavior while considering the unique driving styles of individual drivers. We demonstrate the versatility and adaptability of MetaFollower by showcasing its ability to adapt to new drivers with limited training data quickly. To evaluate the performance of MetaFollower, we conduct rigorous experiments comparing it with both data-driven and physics-based models. The results reveal that our proposed framework outperforms baseline models in predicting car-following behavior with higher accuracy and safety. To the best of our knowledge, this is the first car-following model aiming to achieve fast adaptation by considering both driver and temporal heterogeneity based on meta-learning.




In this technical report, we present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ Dataset Challenge at CVPR 2024. Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling. Initially, we independently train the occupancy model, followed by flow prediction using sequential frame integration. Our method combines regression with classification to address scale variations in different scenes, and leverages predicted flow to warp current voxel features to future frames, guided by future frame ground truth. Experimental results on the nuScenes dataset demonstrate significant improvements in accuracy and robustness, showcasing the effectiveness of our approach in real-world scenarios. Our single model based on Swin-Base ranks second on the public leaderboard, validating the potential of our method in advancing autonomous car perception systems.
Multi-agent systems (MAS) constitute a significant role in exploring machine intelligence and advanced applications. In order to deeply investigate complicated interactions within MAS scenarios, we originally propose "GNN for MBRL" model, which utilizes a state-spaced Graph Neural Networks with Model-based Reinforcement Learning to address specific MAS missions (e.g., Billiard-Avoidance, Autonomous Driving Cars). In detail, we firstly used GNN model to predict future states and trajectories of multiple agents, then applied the Cross-Entropy Method (CEM) optimized Model Predictive Control to assist the ego-agent planning actions and successfully accomplish certain MAS tasks.




Accurate vehicle detection is essential for the development of intelligent transportation systems, autonomous driving, and traffic monitoring. This paper presents a detailed analysis of YOLO11, the latest advancement in the YOLO series of deep learning models, focusing exclusively on vehicle detection tasks. Building upon the success of its predecessors, YOLO11 introduces architectural improvements designed to enhance detection speed, accuracy, and robustness in complex environments. Using a comprehensive dataset comprising multiple vehicle types-cars, trucks, buses, motorcycles, and bicycles we evaluate YOLO11's performance using metrics such as precision, recall, F1 score, and mean average precision (mAP). Our findings demonstrate that YOLO11 surpasses previous versions (YOLOv8 and YOLOv10) in detecting smaller and more occluded vehicles while maintaining a competitive inference time, making it well-suited for real-time applications. Comparative analysis shows significant improvements in the detection of complex vehicle geometries, further contributing to the development of efficient and scalable vehicle detection systems. This research highlights YOLO11's potential to enhance autonomous vehicle performance and traffic monitoring systems, offering insights for future developments in the field.




Point cloud segmentation (PCS) aims to make per-point predictions and enables robots and autonomous driving cars to understand the environment. The range image is a dense representation of a large-scale outdoor point cloud, and segmentation models built upon the image commonly execute efficiently. However, the projection of the point cloud onto the range image inevitably leads to dropping points because, at each image coordinate, only one point is kept despite multiple points being projected onto the same location. More importantly, it is challenging to assign correct predictions to the dropped points that belong to the classes different from the kept point class. Besides, existing post-processing methods, such as K-nearest neighbor (KNN) search and kernel point convolution (KPConv), cannot be trained with the models in an end-to-end manner or cannot process varying-density outdoor point clouds well, thereby enabling the models to achieve sub-optimal performance. To alleviate this problem, we propose a trainable pointwise decoder module (PDM) as the post-processing approach, which gathers weighted features from the neighbors and then makes the final prediction for the query point. In addition, we introduce a virtual range image-guided copy-rotate-paste (VRCrop) strategy in data augmentation. VRCrop constrains the total number of points and eliminates undesirable artifacts in the augmented point cloud. With PDM and VRCrop, existing range image-based segmentation models consistently perform better than their counterparts on the SemanticKITTI, SemanticPOSS, and nuScenes datasets.
One of the challenges of Industry 4.0, is to determine and optimize the flow of data, products and materials in manufacturing companies. To realize these challenges, many solutions have been defined such as the utilization of automated guided vehicles (AGVs). However, being guided is a handicap for these vehicles to fully meet the requirements of Industry 4.0 in terms of adaptability and flexibility: the autonomy of vehicles cannot be reduced to predetermined trajectories. Therefore, it is necessary to develop their autonomy. This will be possible by designing new generations of industrial autonomous vehicles (IAVs), in the form of intelligent and cooperative autonomous mobile robots.In the field of road transport, research is very active to make the car autonomous. Many algorithms, solving problematic traffic situations similar to those that can occur in an industrial environment, can be transposed in the industrial field and therefore for IAVs. The technologies standardized in dedicated bodies (e.g., ETSI TC ITS), such as those concerning the exchange of messages between vehicles to increase their awareness or their ability to cooperate, can also be transposed to the industrial context. The deployment of intelligent autonomous vehicle fleets raises several challenges: acceptability by employees, vehicle location, traffic fluidity, vehicle perception of changing environments (dynamic), vehicle-infrastructure cooperation, or vehicles heterogeneity. In this context, developing the autonomy of IAVs requires a relevant working method. The identification of reusable or adaptable algorithms to the various problems raised by the increase in the autonomy of IAVs is not sufficient, it is also necessary to be able to model, to simulate, to test and to experiment with the proposed solutions. Simulation is essential since it allows both to adapt and to validate the algorithms, but also to design and to prepare the experiments.To improve the autonomy of a fleet, we consider the approach relying on a collective intelligence to make the behaviours of vehicles adaptive. In this chapter, we will focus on a class of problems faced by IAVs related to collision and obstacle avoidance. Among these problems, we are particularly interested when two vehicles need to cross an intersection at the same time, known as a deadlock situation. But also, when obstacles are present in the aisles and need to be avoided by the vehicles safely.




This paper presents an autonomous method to address challenges arising from severe lighting conditions in machine vision applications that use event cameras. To manage these conditions, the research explores the built in potential of these cameras to adjust pixel functionality, named bias settings. As cars are driven at various times and locations, shifts in lighting conditions are unavoidable. Consequently, this paper utilizes the neuromorphic YOLO-based face tracking module of a driver monitoring system as the event-based application to study. The proposed method uses numerical metrics to continuously monitor the performance of the event-based application in real-time. When the application malfunctions, the system detects this through a drop in the metrics and automatically adjusts the event cameras bias values. The Nelder-Mead simplex algorithm is employed to optimize this adjustment, with finetuning continuing until performance returns to a satisfactory level. The advantage of bias optimization lies in its ability to handle conditions such as flickering or darkness without requiring additional hardware or software. To demonstrate the capabilities of the proposed system, it was tested under conditions where detecting human faces with default bias values was impossible. These severe conditions were simulated using dim ambient light and various flickering frequencies. Following the automatic and dynamic process of bias modification, the metrics for face detection significantly improved under all conditions. Autobiasing resulted in an increase in the YOLO confidence indicators by more than 33 percent for object detection and 37 percent for face detection highlighting the effectiveness of the proposed method.




In the field of autonomous surface vehicles (ASVs), devising decision-making and obstacle avoidance solutions that address maritime COLREGs (Collision Regulations), primarily defined for human operators, has long been a pressing challenge. Recent advancements in explainable Artificial Intelligence (AI) and machine learning have shown promise in enabling human-like decision-making. Notably, significant developments have occurred in the application of Large Language Models (LLMs) to the decision-making of complex systems, such as self-driving cars. The textual and somewhat ambiguous nature of COLREGs (from an algorithmic perspective), however, poses challenges that align well with the capabilities of LLMs, suggesting that LLMs may become increasingly suitable for this application soon. This paper presents and demonstrates the first application of LLM-based decision-making and control for ASVs. The proposed method establishes a high-level decision-maker that uses online collision risk indices and key measurements to make decisions for safe manoeuvres. A tailored design and runtime structure is developed to support training and real-time action generation on a realistic ASV model. Local planning and control algorithms are integrated to execute the commands for waypoint following and collision avoidance at a lower level. To the authors' knowledge, this study represents the first attempt to apply explainable AI to the dynamic control problem of maritime systems recognising the COLREGs rules, opening new avenues for research in this challenging area. Results obtained across multiple test scenarios demonstrate the system's ability to maintain online COLREGs compliance, accurate waypoint tracking, and feasible control, while providing human-interpretable reasoning for each decision.