Accurately forecasting the future movements of surrounding vehicles is essential for safe and efficient operations of autonomous driving cars. This task is difficult because a vehicle's moving trajectory is greatly determined by its driver's intention, which is often hard to estimate. By leveraging attention mechanisms along with long short-term memory (LSTM) networks, this work learns the relation between a driver's intention and the vehicle's changing positions relative to road infrastructures, and uses it to guide the prediction. Different from other state-of-the-art solutions, our work treats the on-road lanes as non-Euclidean structures, unfolds the vehicle's moving history to form a spatio-temporal graph, and uses methods from Graph Neural Networks to solve the problem. Not only is our approach a pioneering attempt in using non-Euclidean methods to process static environmental features around a predicted object, our model also outperforms other state-of-the-art models in several metrics. The practicability and interpretability analysis of the model shows great potential for large-scale deployment in various autonomous driving systems in addition to our own.
Tactical decision making and strategic motion planning for autonomous highway driving are challenging due to the complication of predicting other road users' behaviors, diversity of environments, and complexity of the traffic interactions. This paper presents a novel end-to-end continuous deep reinforcement learning approach towards autonomous cars' decision-making and motion planning. For the first time, we define both states and action spaces on the Frenet space to make the driving behavior less variant to the road curvatures than the surrounding actors' dynamics and traffic interactions. The agent receives time-series data of past trajectories of the surrounding vehicles and applies convolutional neural networks along the time channels to extract features in the backbone. The algorithm generates continuous spatiotemporal trajectories on the Frenet frame for the feedback controller to track. Extensive high-fidelity highway simulations on CARLA show the superiority of the presented approach compared with commonly used baselines and discrete reinforcement learning on various traffic scenarios. Furthermore, the proposed method's advantage is confirmed with a more comprehensive performance evaluation against 1000 randomly generated test scenarios.
End-to-end autonomous driving seeks to solve the perception, decision, and control problems in an integrated way, which can be easier to generalize at scale and be more adapting to new scenarios. However, high costs and risks make it very hard to train autonomous cars in the real world. Simulations can therefore be a powerful tool to enable training. Due to slightly different observations, agents trained and evaluated solely in simulation often perform well there but have difficulties in real-world environments. To tackle this problem, we propose a novel model-based reinforcement learning approach called Cycleconsistent World Models. Contrary to related approaches, our model can embed two modalities in a shared latent space and thereby learn from samples in one modality (e.g., simulated data) and be used for inference in different domain (e.g., real-world data). Our experiments using different modalities in the CARLA simulator showed that this enables CCWM to outperform state-of-the-art domain adaptation approaches. Furthermore, we show that CCWM can decode a given latent representation into semantically coherent observations in both modalities.
For autonomous cars to drive safely and effectively, they must anticipate the stochastic future trajectories of other agents in the scene, such as pedestrians and other cars. Forecasting such complex multi-modal distributions requires powerful probabilistic approaches. Normalizing flows have recently emerged as an attractive tool to model such distributions. However, when generating trajectory predictions from a flow model, a key drawback is that independent samples often do not adequately capture all the modes in the underlying distribution. We propose Diversity Sampling for Flow (DSF), a method for improving the quality and the diversity of trajectory samples from a pre-trained flow model. Rather than producing individual samples, DSF produces a set of trajectories in one shot. Given a pre-trained forecasting flow model, we train DSF using gradients from the model, to optimize an objective function that rewards high likelihood for individual trajectories in the predicted set, together with high spatial separation between trajectories. DSF is easy to implement, and we show that it offers a simple plug-in improvement for several existing flow-based forecasting models, achieving state-of-art results on two challenging vehicle and pedestrian forecasting benchmarks.
The world is looking for a new exciting form of transportation that will cut our travel times considerably. In 2021, the time has come for flying cars to become the new transportation system of this century. Electric vertical take-off and landing (eVTOL) vehicles, which are a type of flying cars, are predicted to be used for passenger and package transportation in dense cities. In order to fly safely and reliably, wireless communications for eVTOLs must be developed with stringent eVTOL communication requirements. Indeed, their communication needs to be ultra-reliable, secure with ultra-high data rate and low latency to fulfill various tasks such as autonomous driving, sharing a massive amount of data in a short amount of time, and high-level communication security. In this paper, we propose major key communication enablers for eVTOLs ranging from the architecture, air-interface, networking, frequencies, security, and computing. To show the relevance and the impact of one of the key enablers, we carried out comparative simulations to show the superiority compared to the current technology. We compared the usage of an air-based communication infrastructure with a tower mast in a realistic scenario involving eVTOLs, delivery drones, pedestrians, and vehicles.
Vibration patterns yield valuable information about the health state of a running machine, which is commonly exploited in predictive maintenance tasks for large industrial systems. However, the overhead, in terms of size, complexity and power budget, required by classical methods to exploit this information is often prohibitive for smaller-scale applications such as autonomous cars, drones or robotics. Here we propose a neuromorphic approach to perform vibration analysis using spiking neural networks that can be applied to a wide range of scenarios. We present a spike-based end-to-end pipeline able to detect system anomalies from vibration data, using building blocks that are compatible with analog-digital neuromorphic circuits. This pipeline operates in an online unsupervised fashion, and relies on a cochlea model, on feedback adaptation and on a balanced spiking neural network. We show that the proposed method achieves state-of-the-art performance or better against two publicly available data sets. Further, we demonstrate a working proof-of-concept implemented on an asynchronous neuromorphic processor device. This work represents a significant step towards the design and implementation of autonomous low-power edge-computing devices for online vibration monitoring.
The ability to detect and segment moving objects in a scene is essential for building consistent maps, making future state predictions, avoiding collisions, and planning. In this paper, we address the problem of moving object segmentation from 3D LiDAR scans. We propose a novel approach that pushes the current state of the art in LiDAR-only moving object segmentation forward to provide relevant information for autonomous robots and other vehicles. Instead of segmenting the point cloud semantically, i.e., predicting the semantic classes such as vehicles, pedestrians, buildings, roads, etc., our approach accurately segments the scene into moving and static objects, i.e., distinguishing between moving cars vs. parked cars. Our proposed approach exploits sequential range images from a rotating 3D LiDAR sensor as an intermediate representation combined with a convolutional neural network and runs faster than the frame rate of the sensor. We compare our approach to several other state-of-the-art methods showing superior segmentation quality in urban environments. Additionally, we created a new benchmark for LiDAR-based moving object segmentation based on SemanticKITTI. We publish it to allow other researchers to compare their approaches transparently and we will publish our code.
Perception technologies in Autonomous Driving are experiencing their golden age due to the advances in Deep Learning. Yet, most of these systems rely on the semantically rich information of RGB images. Deep Learning solutions applied to the data of other sensors typically mounted on autonomous cars (e.g. lidars or radars) are not explored much. In this paper we propose a novel solution to understand the dynamics of moving vehicles of the scene from only lidar information. The main challenge of this problem stems from the fact that we need to disambiguate the proprio-motion of the 'observer' vehicle from that of the external 'observed' vehicles. For this purpose, we devise a CNN architecture which at testing time is fed with pairs of consecutive lidar scans. However, in order to properly learn the parameters of this network, during training we introduce a series of so-called pretext tasks which also leverage on image data. These tasks include semantic information about vehicleness and a novel lidar-flow feature which combines standard image-based optical flow with lidar scans. We obtain very promising results and show that including distilled image information only during training, allows improving the inference results of the network at test time, even when image data is no longer used.
Global Navigation Satellite Systems (GNSS) brought navigation to the masses. Coupled with smartphones, the blue dot in the palm of our hands has forever changed the way we interact with the world. Looking forward, cyber-physical systems such as self-driving cars and aerial mobility are pushing the limits of what localization technologies including GNSS can provide. This autonomous revolution requires a solution that supports safety-critical operation, centimeter positioning, and cyber-security for millions of users. To meet these demands, we propose a navigation service from Low Earth Orbiting (LEO) satellites which deliver precision in-part through faster motion, higher power signals for added robustness to interference, constellation autonomous integrity monitoring for integrity, and encryption / authentication for resistance to spoofing attacks. This paradigm is enabled by the 'New Space' movement, where highly capable satellites and components are now built on assembly lines and launch costs have decreased by more than tenfold. Such a ubiquitous positioning service enables a consistent and secure standard where trustworthy information can be validated and shared, extending the electronic horizon from sensor line of sight to an entire city. This enables the situational awareness needed for true safe operation to support autonomy at scale.
Accurate scene understanding from multiple sensors mounted on cars is a key requirement for autonomous driving systems. Nowadays, this task is mainly performed through data-hungry deep learning techniques that need very large amounts of data to be trained. Due to the high cost of performing segmentation labeling, many synthetic datasets have been proposed. However, most of them miss the multi-sensor nature of the data, and do not capture the significant changes introduced by the variation of daytime and weather conditions. To fill these gaps, we introduce SELMA, a novel synthetic dataset for semantic segmentation that contains more than 30K unique waypoints acquired from 24 different sensors including RGB, depth, semantic cameras and LiDARs, in 27 different atmospheric and daytime conditions, for a total of more than 20M samples. SELMA is based on CARLA, an open-source simulator for generating synthetic data in autonomous driving scenarios, that we modified to increase the variability and the diversity in the scenes and class sets, and to align it with other benchmark datasets. As shown by the experimental evaluation, SELMA allows the efficient training of standard and multi-modal deep learning architectures, and achieves remarkable results on real-world data. SELMA is free and publicly available, thus supporting open science and research.