The widescale deployment of Autonomous Vehicles (AV) seems to be imminent despite many safety challenges that are yet to be resolved. It is well known that there are no universally agreed Verification and Validation (VV) methodologies to guarantee absolute safety, which is crucial for the acceptance of this technology. Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements. Modern autonomous vehicles will undoubtedly include machine learning and probabilistic techniques that require a much more comprehensive testing regime due to the non-deterministic nature of the operating design domain. A rigourous statistical validation process is an essential component required to address this challenge. Most research in this area focuses on evaluating system performance in large scale real-world data gathering exercises (number of miles travelled), or randomised test scenarios in simulation. This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios, identifying potential unsafe edge cases.We use reinforcement learning (RL) to learn the behaviours of simulated actors that cause unsafe behaviour measured by the well established RSS safety metric. We demonstrate that by using the method we can more efficiently validate a system using a smaller number of test cases by focusing the simulation towards the worst case scenario, generating edge cases that correspond to unsafe situations.
To navigate through urban roads, an automated vehicle must be able to perceive and recognize objects in a three-dimensional environment. A high-level contextual understanding of the surroundings is necessary to plan and execute accurate driving maneuvers. This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images. The output of a convolutional neural network (CNN) is used as classifier to obtain the labels of the environment. The transference of semantic information between the labelled image and the lidar point cloud is performed in four steps: initially, we use heuristic methods to associate probabilities to all the semantic classes contained in the labelled images. Then, the lidar points are corrected to compensate for the vehicle's motion given the difference between the timestamps of each lidar scan and camera image. In a third step, we calculate the pixel coordinate for the corresponding camera image. In the last step we perform the transfer of semantic information from the heuristic probability images to the lidar frame, while removing the lidar information that is not visible to the camera. We tested our approach in the Usyd Dataset \cite{usyd_dataset}, obtaining qualitative and quantitative results that demonstrate the validity of our probabilistic sensory fusion approach.
This paper proposes an automated method to obtain the extrinsic calibration parameters between a camera and a 3D lidar with as low as 16 beams. We use a checkerboard as a reference to obtain features of interest in both sensor frames. The calibration board centre point and normal vector are automatically extracted from the lidar point cloud by exploiting the geometry of the board. The corresponding features in the camera image are obtained from the camera's extrinsic matrix. We explain the reasons behind selecting these features, and why they are more robust compared to other possibilities. To obtain the optimal extrinsic parameters, we choose a genetic algorithm to address the highly non-linear state space. The process is automated after defining the bounds of the 3D experimental region relative to the lidar, and the true board dimensions. In addition, the camera is assumed to be intrinsically calibrated. Our method requires a minimum of 3 checkerboard poses, and the calibration accuracy is demonstrated by evaluating our algorithm using real world and simulated features.
Centimeter level globally accurate and consistent maps for autonomous vehicles navigation has long been achieved by on board real-time kinematic(RTK)-GPS in open areas. However when dealing with urban environments, GPS will experience multipath and blockage in urban canyon, under bridges, inside tunnels and in underground environments. In this paper we present strategies to efficiently register local maps in geographical coordinate systems through the tactical integration of GPS and information extracted from precisely geo-referenced high resolution aerial orthogonal imagery. Dense lidar point clouds obtained from moving vehicle are projected down to horizontal plane, accurately registered and overlaid on aerial orthoimagery. Sparse, robust and long-term pole-like landmarks are used as anchor points to link lidar and aerial image sensing, and constrain the spatial uncertainties of remaining lidar points that cannot be directly measured and identified. We achieved 15-75cm absolute average global accuracy using precisely geo-referenced aerial imagery as ground truth. This is valuable in enabling the fusion of ground vehicle on-board sensor features with features extracted from aerial images such as traffic and lane markings. It is also useful for cooperative sensing to have an unbiased and accurate global reference. Experimental results are presented demonstrating the accuracy and consistency of the maps when operating in large areas.
Robustness and safety are crucial properties for the real-world application of autonomous vehicles. One of the most critical components of any autonomous system is localisation. During the last 20 years there has been significant progress in this area with the introduction of very efficient algorithms for mapping, localisation and SLAM. Many of these algorithms present impressive demonstrations for a particular domain, but fail to operate reliably with changes to the operating environment. The aspect of robustness has not received enough attention and localisation systems for self-driving vehicle applications are seldom evaluated for their robustness. In this paper we propose novel metrics to effectively quantify localisation robustness with or without an accurate ground truth. The experimental results present a comprehensive analysis of the application of these metrics against a number of well known localisation strategies.
One of the fundamental challenges in the design of perception systems for autonomous vehicles is validating the performance of each algorithm under a comprehensive variety of operating conditions. In the case of vision-based semantic segmentation, there are known issues when encountering new scenarios that are sufficiently different to the training data. In addition, even small variations in environmental conditions such as illumination and precipitation can affect the classification performance of the segmentation model. Given the reliance on visual information, these effects often translate into poor semantic pixel classification which can potentially lead to catastrophic consequences when driving autonomously. This paper presents a novel method for analysing the robustness of semantic segmentation models and provides a number of metrics to evaluate the classification performance over a variety of environmental conditions. The process incorporates an additional sensor (lidar) to automate the process, eliminating the need for labour-intensive hand labelling of validation data. The system integrity can be monitored as the performance of the vision sensors are validated against a different sensor modality. This is necessary for detecting failures that are inherent to vision technology. Experimental results are presented based on multiple datasets collected at different times of the year with different environmental conditions. These results show that the semantic segmentation performance varies depending on the weather, camera parameters, existence of shadows, etc.. The results also demonstrate how the metrics can be used to compare and validate the performance after making improvements to a model, and compare the performance of different networks.
To operate in an urban environment, an automated vehicle must be capable of accurately estimating its position within a global map reference frame. This is necessary for optimal path planning and safe navigation. To accomplish this over an extended period of time, the global map requires long-term maintenance. This includes the addition of newly observable features and the removal of transient features belonging to dynamic objects. The latter is especially important for the long-term use of the map as matching against a map with features that no longer exist can result in incorrect data associations, and consequently erroneous localisation. This paper addresses the problem of removing features from the map that correspond to objects that are no longer observable/present in the environment. This is achieved by assigning a single score which depends on the geometric distribution and characteristics when the features are re-detected (or not) on different occasions. Our approach not only eliminates ephemeral features, but also can be used as a reduction algorithm for highly dense maps. We tested our approach using half a year of weekly drives over the same 500-metre section of road in an urban environment. The results presented demonstrate the validity of the long-term approach to map maintenance.
Semantic segmentation using deep neural networks has been widely explored to generate high-level contextual information for autonomous vehicles. To acquire a complete $180^\circ$ semantic understanding of the forward surroundings, we propose to stitch semantic images from multiple cameras with varying orientations. However, previously trained semantic segmentation models showed unacceptable performance after significant changes to the camera orientations and the lighting conditions. To avoid time-consuming hand labeling, we explore and evaluate the use of data augmentation techniques, specifically skew and gamma correction, from a practical real-world standpoint to extend the existing model and provide more robust performance. The presented experimental results have shown significant improvements with varying illumination and camera perspective changes.
Understanding the intentions of drivers at intersections is a critical component for autonomous vehicles. Urban intersections that do not have traffic signals are a common epicentre of highly variable vehicle movement and interactions. We present a method for predicting driver intent at urban intersections through multi-modal trajectory prediction with uncertainty. Our method is based on recurrent neural networks combined with a mixture density network output layer. To consolidate the multi-modal nature of the output probability distribution, we introduce a clustering algorithm that extracts the set of possible paths that exist in the prediction output, and ranks them according to likelihood. To verify the method's performance and generalizability, we present a real-world dataset that consists of over 23,000 vehicles traversing five different intersections, collected using a vehicle mounted Lidar based tracking system. An array of metrics is used to demonstrate the performance of the model against several baselines.