Every autonomous driving dataset has a different configuration of sensors, originating from distinct geographic regions and covering various scenarios. As a result, 3D detectors tend to overfit the datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another. We observe that lidar scan pattern differences form a large component of this reduction in performance. We address this in our approach, SEE-VCN, by designing a novel viewer-centred surface completion network (VCN) to complete the surfaces of objects of interest within an unsupervised domain adaptation framework, SEE. With SEE-VCN, we obtain a unified representation of objects across datasets, allowing the network to focus on learning geometry, rather than overfitting on scan patterns. By adopting a domain-invariant representation, SEE-VCN can be classed as a multi-target domain adaptation approach where no annotations or re-training is required to obtain 3D detections for new scan patterns. Through extensive experiments, we show that our approach outperforms previous domain adaptation methods in multiple domain adaptation settings. Our code and data are available at https://github.com/darrenjkt/SEE-VCN.
The paper addresses the vehicle-to-X (V2X) data fusion for cooperative or collective perception (CP). This emerging and promising intelligent transportation systems (ITS) technology has enormous potential for improving efficiency and safety of road transportation. Recent advances in V2X communication primarily address the definition of V2X messages and data dissemination amongst ITS stations (ITS-Ss) in a traffic environment. Yet, a largely unsolved problem is how a connected vehicle (CV) can efficiently and consistently fuse its local perception information with the data received from other ITS-Ss. In this paper, we present a novel data fusion framework to fuse the local and V2X perception data for CP that considers the presence of cross-correlation. The proposed approach is validated through comprehensive results obtained from numerical simulation, CARLA simulation, and real-world experimentation that incorporates V2X-enabled intelligent platforms. The real-world experiment includes a CV, a connected and automated vehicle (CAV), and an intelligent roadside unit (IRSU) retrofitted with vision and lidar sensors. We also demonstrate how the fused CP information can improve the awareness of vulnerable road users (VRU) for CV/CAV, and how this information can be considered in path planning/decision making within the CAV to facilitate safe interactions.
Sampling discrepancies between different manufacturers and models of lidar sensors result in inconsistent representations of objects. This leads to performance degradation when 3D detectors trained for one lidar are tested on other types of lidars. Remarkable progress in lidar manufacturing has brought about advances in mechanical, solid-state, and recently, adjustable scan pattern lidars. For the latter, existing works often require fine-tuning the model each time scan patterns are adjusted, which is infeasible. We explicitly deal with the sampling discrepancy by proposing a novel unsupervised multi-target domain adaptation framework, SEE, for transferring the performance of state-of-the-art 3D detectors across both fixed and flexible scan pattern lidars without requiring fine-tuning of models by end-users. Our approach interpolates the underlying geometry and normalizes the scan pattern of objects from different lidars before passing them to the detection network. We demonstrate the effectiveness of SEE on public datasets, achieving state-of-the-art results, and additionally provide quantitative results on a novel high-resolution lidar to prove the industry applications of our framework. This dataset and our code will be made publicly available.
Social acceptance is a major hurdle for autonomous vehicle technology, central to which is ensuring both passengers and nearby pedestrians feel safe. This idea of `feeling safe' and perceived safety is highly subjective and rooted in human intuition. As such, traditional analytical approaches to autonomous navigation often fail to cater for the social expectations of individuals. Therefore, this paper proposes an approach to capture the complexity of social expectations and integrate this complexity into a 3-layered Contextual Speed Controller. The layers were; the legal road speed limit, the socially acceptable speed given the number of nearby pedestrians, and the socially acceptable speed based on proximity to nearby pedestrians. An implementation of this layered approach was tested in areas of both low and high vehicle-pedestrian interactions. From the experiments conducted, the lower two layers were seen working in tandem to modulate the vehicle speed to appropriate levels that mimicked conservative human driver behaviour. In summary, this work quantified the relationship between pedestrian context and socially acceptable vehicle speeds, allowing for more perceivably safe autonomous driving. Furthermore, the need for different driving schemes for navigating different road environments was identified.
We propose a robust calibration pipeline that optimises the selection of calibration samples for the estimation of calibration parameters that fit the entire scene. We minimise user error by automating the data selection process according to a metric, called Variability of Quality (VOQ) that gives a score to each calibration set of samples. We show that this VOQ score is correlated with the estimated calibration parameter's ability to generalise well to the entire scene, thereby overcoming the overfitting problems of existing calibration algorithms. Our approach has the benefits of simplifying the calibration process for practitioners of any calibration expertise level and providing an objective measure of the quality for our calibration pipeline's input and output data. We additionally use a novel method of assessing the accuracy of the calibration parameters. It involves computing reprojection errors for the entire scene to ensure that the parameters are well fitted to all features in the scene. Our proposed calibration pipeline takes 90s, and obtains an average reprojection error of 1-1.2cm, with standard deviation of 0.4-0.5cm over 46 poses evenly distributed in a scene. This process has been validated by experimentation on a high resolution, software definable lidar, Baraja Spectrum-Scan; and a low, fixed resolution lidar, Velodyne VLP-16. We have shown that despite the vast differences in lidar technologies, our proposed approach manages to estimate robust calibration parameters for both. Our code and data set used for this paper are made available as open-source.
Seamlessly operating an autonomous vehicle in a crowded pedestrian environment is a very challenging task. This is because human movement and interactions are very hard to predict in such environments. Recent work has demonstrated that reinforcement learning-based methods have the ability to learn to drive in crowds. However, these methods can have very poor performance due to inaccurate predictions of the pedestrians' future state as human motion prediction has a large variance. To overcome this problem, we propose a new method, SARL-SGAN-KCE, that combines a deep socially aware attentive value network with a human multimodal trajectory prediction model to help identify the optimal driving policy. We also introduce a novel technique to extend the discrete action space with minimal additional computational requirements. The kinematic constraints of the vehicle are also considered to ensure smooth and safe trajectories. We evaluate our method against the state of art methods for crowd navigation and provide an ablation study to show that our method is safer and closer to human behaviour.
Autonomous vehicle navigation in shared pedestrian environments requires the ability to predict future crowd motion both accurately and with minimal delay. Understanding the uncertainty of the prediction is also crucial. Most existing approaches however can only estimate uncertainty through repeated sampling of generative models. Additionally, most current predictive models are trained on datasets that assume complete observability of the crowd using an aerial view. These are generally not representative of real-world usage from a vehicle perspective, and can lead to the underestimation of uncertainty bounds when the on-board sensors are occluded. Inspired by prior work in motion prediction using spatio-temporal graphs, we propose a novel Graph Convolutional Neural Network (GCNN)-based approach, Attentional-GCNN, which aggregates information of implicit interaction between pedestrians in a crowd by assigning attention weight in edges of the graph. Our model can be trained to either output a probabilistic distribution or faster deterministic prediction, demonstrating applicability to autonomous vehicle use cases where either speed or accuracy with uncertainty bounds are required. To further improve the training of predictive models, we propose an automatically labelled pedestrian dataset collected from an intelligent vehicle platform representative of real-world use. Through experiments on a number of datasets, we show our proposed method achieves an improvement over the state of art by 10% Average Displacement Error (ADE) and 12% Final Displacement Error (FDE) with fast inference speeds.
Cooperative perception, or collective perception (CP) is an emerging and promising technology for intelligent transportation systems (ITS). It enables an ITS station (ITS-S) to share its local perception information with others by means of vehicle-to-X (V2X) communication, thereby achieving improved efficiency and safety in road transportation. In this paper, we present our recent progress on the development of a connected and automated vehicle (CAV) and intelligent roadside unit (IRSU). We present three different experiments to demonstrate the use of CP service within intelligent infrastructure to improve awareness of vulnerable road users (VRU) and thus safety for CAVs in various traffic scenarios. We demonstrate in the experiments that a connected vehicle (CV) can "see" a pedestrian around the corners. More importantly, we demonstrate how CAVs can autonomously and safely interact with walking and running pedestrians, relying only on the CP information from the IRSU through vehicle-to-infrastructure (V2I) communication. This is one of the first demonstrations of urban vehicle automation using only CP information. We also address in the paper the handling of collective perception messages (CPMs) received from the IRSU, and passing them through a pipeline of CP information coordinate transformation with uncertainty, multiple road user tracking, and eventually path planning/decision making within the CAV. The experimental results were obtained with manually driven CV, fully autonomous CAV, and an IRSU retrofitted with vision and laser sensors and a road user tracking system.
For autonomous vehicles to operate persistently in a typical urban environment, it is essential to have high accuracy position information. This requires a mapping and localisation system that can adapt to changes over time. A localisation approach based on a single-survey map will not be suitable for long-term operation as it does not incorporate variations in the environment. In this paper, we present new algorithms to maintain a featured-based map. A map maintenance pipeline is proposed that can continuously update a map with the most relevant features taking advantage of the changes in the surroundings. Our pipeline detects and removes transient features based on their geometrical relationships with the vehicle's pose. Newly identified features became part of a new feature map and are assessed by the pipeline as candidates for the localisation map. By purging out-of-date features and adding newly detected features, we continually update the prior map to more accurately represent the most recent environment. We have validated our approach using the USyd Campus Dataset, which includes more than 18 months of data. The results presented demonstrate that our maintenance pipeline produces a resilient map which can provide sustained localisation performance over time.
Understanding and predicting the intention of pedestrians is essential to enable autonomous vehicles and mobile robots to navigate crowds. This problem becomes increasingly complex when we consider the uncertainty and multimodality of pedestrian motion, as well as the implicit interactions between members of a crowd, including any response to a vehicle. Our approach, Probabilistic Crowd GAN, extends recent work in trajectory prediction, combining Recurrent Neural Networks (RNNs) with Mixture Density Networks (MDNs) to output probabilistic multimodal predictions, from which likely modal paths are found and used for adversarial training. We also propose the use of Graph Vehicle-Pedestrian Attention Network (GVAT), which models social interactions and allows input of a shared vehicle feature, showing that inclusion of this module leads to improved trajectory prediction both with and without the presence of a vehicle. Through evaluation on various datasets, we demonstrate improvements on the existing state of the art methods for trajectory prediction and illustrate how the true multimodal and uncertain nature of crowd interactions can be directly modelled.