Autonomous cars are self-driving vehicles that use artificial intelligence (AI) and sensors to navigate and operate without human intervention, using high-resolution cameras and lidars that detect what happens in the car's immediate surroundings. They have the potential to revolutionize transportation by improving safety, efficiency, and accessibility.
Camera-based Semantic Scene Completion (SSC) is gaining attentions in the 3D perception field. However, properties such as perspective and occlusion lead to the underestimation of the geometry in distant regions, posing a critical issue for safety-focused autonomous driving systems. To tackle this, we propose ScanSSC, a novel camera-based SSC model composed of a Scan Module and Scan Loss, both designed to enhance distant scenes by leveraging context from near-viewpoint scenes. The Scan Module uses axis-wise masked attention, where each axis employing a near-to-far cascade masking that enables distant voxels to capture relationships with preceding voxels. In addition, the Scan Loss computes the cross-entropy along each axis between cumulative logits and corresponding class distributions in a near-to-far direction, thereby propagating rich context-aware signals to distant voxels. Leveraging the synergy between these components, ScanSSC achieves state-of-the-art performance, with IoUs of 44.54 and 48.29, and mIoUs of 17.40 and 20.14 on the SemanticKITTI and SSCBench-KITTI-360 benchmarks.




Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.
Multi-agent systems (MAS) constitute a significant role in exploring machine intelligence and advanced applications. In order to deeply investigate complicated interactions within MAS scenarios, we originally propose "GNN for MBRL" model, which utilizes a state-spaced Graph Neural Networks with Model-based Reinforcement Learning to address specific MAS missions (e.g., Billiard-Avoidance, Autonomous Driving Cars). In detail, we firstly used GNN model to predict future states and trajectories of multiple agents, then applied the Cross-Entropy Method (CEM) optimized Model Predictive Control to assist the ego-agent planning actions and successfully accomplish certain MAS tasks.




Environment perception is a fundamental part of the dynamic driving task executed by Autonomous Driving Systems (ADS). Artificial Intelligence (AI)-based approaches have prevailed over classical techniques for realizing the environment perception. Current safety-relevant standards for automotive systems, International Organization for Standardization (ISO) 26262 and ISO 21448, assume the existence of comprehensive requirements specifications. These specifications serve as the basis on which the functionality of an automotive system can be rigorously tested and checked for compliance with safety regulations. However, AI-based perception systems do not have complete requirements specification. Instead, large datasets are used to train AI-based perception systems. This paper presents a function monitor for the functional runtime monitoring of a two-folded AI-based environment perception for ADS, based respectively on camera and LiDAR sensors. To evaluate the applicability of the function monitor, we conduct a qualitative scenario-based evaluation in a controlled laboratory environment using a model car. The evaluation results then are discussed to provide insights into the monitor's performance and its suitability for real-world applications.




Point cloud segmentation (PCS) aims to make per-point predictions and enables robots and autonomous driving cars to understand the environment. The range image is a dense representation of a large-scale outdoor point cloud, and segmentation models built upon the image commonly execute efficiently. However, the projection of the point cloud onto the range image inevitably leads to dropping points because, at each image coordinate, only one point is kept despite multiple points being projected onto the same location. More importantly, it is challenging to assign correct predictions to the dropped points that belong to the classes different from the kept point class. Besides, existing post-processing methods, such as K-nearest neighbor (KNN) search and kernel point convolution (KPConv), cannot be trained with the models in an end-to-end manner or cannot process varying-density outdoor point clouds well, thereby enabling the models to achieve sub-optimal performance. To alleviate this problem, we propose a trainable pointwise decoder module (PDM) as the post-processing approach, which gathers weighted features from the neighbors and then makes the final prediction for the query point. In addition, we introduce a virtual range image-guided copy-rotate-paste (VRCrop) strategy in data augmentation. VRCrop constrains the total number of points and eliminates undesirable artifacts in the augmented point cloud. With PDM and VRCrop, existing range image-based segmentation models consistently perform better than their counterparts on the SemanticKITTI, SemanticPOSS, and nuScenes datasets.
One of the challenges of Industry 4.0, is to determine and optimize the flow of data, products and materials in manufacturing companies. To realize these challenges, many solutions have been defined such as the utilization of automated guided vehicles (AGVs). However, being guided is a handicap for these vehicles to fully meet the requirements of Industry 4.0 in terms of adaptability and flexibility: the autonomy of vehicles cannot be reduced to predetermined trajectories. Therefore, it is necessary to develop their autonomy. This will be possible by designing new generations of industrial autonomous vehicles (IAVs), in the form of intelligent and cooperative autonomous mobile robots.In the field of road transport, research is very active to make the car autonomous. Many algorithms, solving problematic traffic situations similar to those that can occur in an industrial environment, can be transposed in the industrial field and therefore for IAVs. The technologies standardized in dedicated bodies (e.g., ETSI TC ITS), such as those concerning the exchange of messages between vehicles to increase their awareness or their ability to cooperate, can also be transposed to the industrial context. The deployment of intelligent autonomous vehicle fleets raises several challenges: acceptability by employees, vehicle location, traffic fluidity, vehicle perception of changing environments (dynamic), vehicle-infrastructure cooperation, or vehicles heterogeneity. In this context, developing the autonomy of IAVs requires a relevant working method. The identification of reusable or adaptable algorithms to the various problems raised by the increase in the autonomy of IAVs is not sufficient, it is also necessary to be able to model, to simulate, to test and to experiment with the proposed solutions. Simulation is essential since it allows both to adapt and to validate the algorithms, but also to design and to prepare the experiments.To improve the autonomy of a fleet, we consider the approach relying on a collective intelligence to make the behaviours of vehicles adaptive. In this chapter, we will focus on a class of problems faced by IAVs related to collision and obstacle avoidance. Among these problems, we are particularly interested when two vehicles need to cross an intersection at the same time, known as a deadlock situation. But also, when obstacles are present in the aisles and need to be avoided by the vehicles safely.




The continual evolution of autonomous driving technology requires car-following models that can adapt to diverse and dynamic traffic environments. Traditional learning-based models often suffer from performance degradation when encountering unseen traffic patterns due to a lack of continual learning capabilities. This paper proposes a novel car-following model based on continual learning that addresses this limitation. Our framework incorporates Elastic Weight Consolidation (EWC) and Memory Aware Synapses (MAS) techniques to mitigate catastrophic forgetting and enable the model to learn incrementally from new traffic data streams. We evaluate the performance of the proposed model on the Waymo and Lyft datasets which encompass various traffic scenarios. The results demonstrate that the continual learning techniques significantly outperform the baseline model, achieving 0\% collision rates across all traffic conditions. This research contributes to the advancement of autonomous driving technology by fostering the development of more robust and adaptable car-following models.




Accurate vehicle detection is essential for the development of intelligent transportation systems, autonomous driving, and traffic monitoring. This paper presents a detailed analysis of YOLO11, the latest advancement in the YOLO series of deep learning models, focusing exclusively on vehicle detection tasks. Building upon the success of its predecessors, YOLO11 introduces architectural improvements designed to enhance detection speed, accuracy, and robustness in complex environments. Using a comprehensive dataset comprising multiple vehicle types-cars, trucks, buses, motorcycles, and bicycles we evaluate YOLO11's performance using metrics such as precision, recall, F1 score, and mean average precision (mAP). Our findings demonstrate that YOLO11 surpasses previous versions (YOLOv8 and YOLOv10) in detecting smaller and more occluded vehicles while maintaining a competitive inference time, making it well-suited for real-time applications. Comparative analysis shows significant improvements in the detection of complex vehicle geometries, further contributing to the development of efficient and scalable vehicle detection systems. This research highlights YOLO11's potential to enhance autonomous vehicle performance and traffic monitoring systems, offering insights for future developments in the field.




This paper presents an autonomous method to address challenges arising from severe lighting conditions in machine vision applications that use event cameras. To manage these conditions, the research explores the built in potential of these cameras to adjust pixel functionality, named bias settings. As cars are driven at various times and locations, shifts in lighting conditions are unavoidable. Consequently, this paper utilizes the neuromorphic YOLO-based face tracking module of a driver monitoring system as the event-based application to study. The proposed method uses numerical metrics to continuously monitor the performance of the event-based application in real-time. When the application malfunctions, the system detects this through a drop in the metrics and automatically adjusts the event cameras bias values. The Nelder-Mead simplex algorithm is employed to optimize this adjustment, with finetuning continuing until performance returns to a satisfactory level. The advantage of bias optimization lies in its ability to handle conditions such as flickering or darkness without requiring additional hardware or software. To demonstrate the capabilities of the proposed system, it was tested under conditions where detecting human faces with default bias values was impossible. These severe conditions were simulated using dim ambient light and various flickering frequencies. Following the automatic and dynamic process of bias modification, the metrics for face detection significantly improved under all conditions. Autobiasing resulted in an increase in the YOLO confidence indicators by more than 33 percent for object detection and 37 percent for face detection highlighting the effectiveness of the proposed method.




This work presents the experiments and solution outline for our teams winning submission in the Learn To Race Autonomous Racing Virtual Challenge 2022 hosted by AIcrowd. The objective of the Learn-to-Race competition is to push the boundary of autonomous technology, with a focus on achieving the safety benefits of autonomous driving. In the description the competition is framed as a reinforcement learning (RL) challenge. We focused our initial efforts on implementation of Soft Actor Critic (SAC) variants. Our goal was to learn non-trivial control of the race car exclusively from visual and geometric features, directly mapping pixels to control actions. We made suitable modifications to the default reward policy aiming to promote smooth steering and acceleration control. The framework for the competition provided real time simulation, meaning a single episode (learning experience) is measured in minutes. Instead of pursuing parallelisation of episodes we opted to explore a more traditional approach in which the visual perception was processed (via learned operators) and fed into rule-based controllers. Such a system, while not as academically "attractive" as a pixels-to-actions approach, results in a system that requires less training, is more explainable, generalises better and is easily tuned and ultimately out-performed all other agents in the competition by a large margin.