Tactical decision making and strategic motion planning for autonomous highway driving are challenging due to the complication of predicting other road users' behaviors, diversity of environments, and complexity of the traffic interactions. This paper presents a novel end-to-end continuous deep reinforcement learning approach towards autonomous cars' decision-making and motion planning. For the first time, we define both states and action spaces on the Frenet space to make the driving behavior less variant to the road curvatures than the surrounding actors' dynamics and traffic interactions. The agent receives time-series data of past trajectories of the surrounding vehicles and applies convolutional neural networks along the time channels to extract features in the backbone. The algorithm generates continuous spatiotemporal trajectories on the Frenet frame for the feedback controller to track. Extensive high-fidelity highway simulations on CARLA show the superiority of the presented approach compared with commonly used baselines and discrete reinforcement learning on various traffic scenarios. Furthermore, the proposed method's advantage is confirmed with a more comprehensive performance evaluation against 1000 randomly generated test scenarios.
Over the past decade, many research articles have been published in the area of autonomous driving. However, most of them focus only on a specific technological area, such as visual environment perception, vehicle control, etc. Furthermore, due to fast advances in the self-driving car technology, such articles become obsolete very fast. In this paper, we give a brief but comprehensive overview on key ingredients of autonomous cars (ACs), including driving automation levels, AC sensors, AC software, open source datasets, industry leaders, AC applications and existing challenges.
In order to be globally deployed, autonomous cars must guarantee the safety of pedestrians. This is the reason why forecasting pedestrians' intentions sufficiently in advance is one of the most critical and challenging tasks for autonomous vehicles. This work tries to solve this problem by jointly predicting the intention and visual states of pedestrians. In terms of visual states, whereas previous work focused on x-y coordinates, we will also predict the size and indeed the whole bounding box of the pedestrian. The method is a recurrent neural network in a multi-task learning approach. It has one head that predicts the intention of the pedestrian for each one of its future position and another one predicting the visual states of the pedestrian. Experiments on the JAAD dataset show the superiority of the performance of our method compared to previous works for intention prediction. Also, although its simple architecture (more than 2 times faster), the performance of the bounding box prediction is comparable to the ones yielded by much more complex architectures. Our code is available online.
Recent advances in Artificial Intelligence (AI), especially in Machine Learning (ML), have introduced various practical applications (e.g., virtual personal assistants and autonomous cars) that enhance the experience of everyday users. However, modern ML technologies like Deep Learning require considerable technical expertise and resources to develop, train and deploy such models, making effective reuse of the ML models a necessity. Such discovery and reuse by practitioners and researchers are being addressed by public ML package repositories, which bundle up pre-trained models into packages for publication. Since such repositories are a recent phenomenon, there is no empirical data on their current state and challenges. Hence, this paper conducts an exploratory study that analyzes the structure and contents of two popular ML package repositories, TFHub and PyTorch Hub, comparing their information elements (features and policies), package organization, package manager functionalities and usage contexts against popular software package repositories (npm, PyPI, and CRAN). Through these studies, we have identified unique SE practices and challenges for sharing ML packages. These findings and implications would be useful for data scientists, researchers and software developers who intend to use these shared ML packages.
Nowadays, many of the images captured are "observed" by machines only and not by humans, for example, robots' or autonomous cars' cameras. High-level machine vision models, such as object recognition, assume images are transformed to some canonical image space by the camera ISP. However, the camera ISP is optimized for producing visually pleasing images to human observers and not for machines, thus, one may spare the ISP compute time and apply the vision models directly to the raw data. Yet, it has been shown that training such models directly on the RAW images results in a performance drop. To mitigate this drop in performance (without the need to annotate RAW data), we use a dataset of RAW and RGB image pairs, which can be easily acquired with no human labeling. We then train a model that is applied directly to the RAW data by using knowledge distillation such that the model predictions for RAW images will be aligned with the predictions of an off-the-shelf pre-trained model for processed RGB images. Our experiments show that our performance on RAW images is significantly better than a model trained on labeled RAW images. It also reasonably matches the predictions of a pre-trained model on processed RGB images, while saving the ISP compute overhead.
Once an academic venture, autonomous driving has received unparalleled corporate funding in the last decade. Still, the operating conditions of current autonomous cars are mostly restricted to ideal scenarios. This means that driving in challenging illumination conditions such as night, sunrise, and sunset remains an open problem. In these cases, standard cameras are being pushed to their limits in terms of low light and high dynamic range performance. To address these challenges, we propose, DSEC, a new dataset that contains such demanding illumination conditions and provides a rich set of sensory data. DSEC offers data from a wide-baseline stereo setup of two color frame cameras and two high-resolution monochrome event cameras. In addition, we collect lidar data and RTK GPS measurements, both hardware synchronized with all camera data. One of the distinctive features of this dataset is the inclusion of high-resolution event cameras. Event cameras have received increasing attention for their high temporal resolution and high dynamic range performance. However, due to their novelty, event camera datasets in driving scenarios are rare. This work presents the first high-resolution, large-scale stereo dataset with event cameras. The dataset contains 53 sequences collected by driving in a variety of illumination conditions and provides ground truth disparity for the development and evaluation of event-based stereo algorithms.
Radar sensors are crucial for environment perception of driver assistance systems as well as autonomous cars. Key performance factors are a fine range resolution and the possibility to directly measure velocity. With a rising number of radar sensors and the so far unregulated automotive radar frequency band, mutual interference is inevitable and must be dealt with. Sensors must be capable of detecting, or even mitigating the harmful effects of interference, which include a decreased detection sensitivity. In this paper, we evaluate a Convolutional Neural Network (CNN)-based approach for interference mitigation on real-world radar measurements. We combine real measurements with simulated interference in order to create input-output data suitable for training the model. We analyze the performance to model complexity relation on simulated and measurement data, based on an extensive parameter search. Further, a finite sample size performance comparison shows the effectiveness of the model trained on either simulated or real data as well as for transfer learning. A comparative performance analysis with the state of the art emphasizes the potential of CNN-based models for interference mitigation and denoising of real-world measurements, also considering resource constraints of the hardware.
For autonomous cars to drive safely and effectively, they must anticipate the stochastic future trajectories of other agents in the scene, such as pedestrians and other cars. Forecasting such complex multi-modal distributions requires powerful probabilistic approaches. Normalizing flows have recently emerged as an attractive tool to model such distributions. However, when generating trajectory predictions from a flow model, a key drawback is that independent samples often do not adequately capture all the modes in the underlying distribution. We propose Diversity Sampling for Flow (DSF), a method for improving the quality and the diversity of trajectory samples from a pre-trained flow model. Rather than producing individual samples, DSF produces a set of trajectories in one shot. Given a pre-trained forecasting flow model, we train DSF using gradients from the model, to optimize an objective function that rewards high likelihood for individual trajectories in the predicted set, together with high spatial separation between trajectories. DSF is easy to implement, and we show that it offers a simple plug-in improvement for several existing flow-based forecasting models, achieving state-of-art results on two challenging vehicle and pedestrian forecasting benchmarks.
Real-time 3D object detection is crucial for autonomous cars. Achieving promising performance with high efficiency, voxel-based approaches have received considerable attention. However, previous methods model the input space with features extracted from equally divided sub-regions without considering that point cloud is generally non-uniformly distributed over the space. To address this issue, we propose a novel 3D object detection framework with dynamic information modeling. The proposed framework is designed in a coarse-to-fine manner. Coarse predictions are generated in the first stage via a voxel-based region proposal network. We introduce InfoFocus, which improves the coarse detections by adaptively refining features guided by the information of point cloud density. Experiments are conducted on the large-scale nuScenes 3D detection benchmark. Results show that our framework achieves the state-of-the-art performance with 31 FPS and improves our baseline significantly by 9.0% mAP on the nuScenes test set.
Widespread adoption of autonomous cars will require greater confidence in their safety than is currently possible. Certified control is a new safety architecture whose goal is two-fold: to achieve a very high level of safety, and to provide a framework for justifiable confidence in that safety. The key idea is a runtime monitor that acts, along with sensor hardware and low-level control and actuators, as a small trusted base, ensuring the safety of the system as a whole. Unfortunately, in current systems complex perception makes the verification even of a runtime monitor challenging. Unlike traditional runtime monitoring, therefore, a certified control monitor does not perform perception and analysis itself. Instead, the main controller assembles evidence that the proposed action is safe into a certificate that is then checked independently by the monitor. This exploits the classic gap between the costs of finding and checking. The controller is assigned the task of finding the certificate, and can thus use the most sophisticated algorithms available (including learning-enabled software); the monitor is assigned only the task of checking, and can thus run quickly and be smaller and formally verifiable. This paper explains the key ideas of certified control and illustrates them with a certificate for LiDAR data and its formal verification. It shows how the architecture dramatically reduces the amount of code to be verified, providing an end-to-end safety analysis that would likely not be achievable in a traditional architecture.