In recent years, Convolutional Neural Networks (ConvNets) have become an enabling technology for a wide range of novel embedded Artificial Intelligence systems. Across the range of applications, the performance needs vary significantly, from high-throughput video surveillance to the very low-latency requirements of autonomous cars. In this context, FPGAs can provide a potential platform that can be optimally configured based on the different performance needs. However, the complexity of ConvNet models keeps increasing making their mapping to an FPGA device a challenging task. This work presents fpgaConvNet, an end-to-end framework for mapping ConvNets on FPGAs. The proposed framework employs an automated design methodology based on the Synchronous Dataflow (SDF) paradigm and defines a set of SDF transformations in order to efficiently explore the architectural design space. By selectively optimising for throughput, latency or multiobjective criteria, the presented tool is able to efficiently explore the design space and generate hardware designs from high-level ConvNet specifications, explicitly optimised for the performance metric of interest. Overall, our framework yields designs that improve the performance by up to 6.65x over highly optimised embedded GPU designs for the same power constraints in embedded environments.
Motion prediction (MP) of multiple agents is a crucial task in arbitrarily complex environments, from social robots to self-driving cars. Current approaches tackle this problem using end-to-end networks, where the input data is usually a rendered top-view of the scene and the past trajectories of all the agents; leveraging this information is a must to obtain optimal performance. In that sense, a reliable Autonomous Driving (AD) system must produce reasonable predictions on time, however, despite many of these approaches use simple ConvNets and LSTMs, models might not be efficient enough for real-time applications when using both sources of information (map and trajectory history). Moreover, the performance of these models highly depends on the amount of training data, which can be expensive (particularly the annotated HD maps). In this work, we explore how to achieve competitive performance on the Argoverse 1.0 Benchmark using efficient attention-based models, which take as input the past trajectories and map-based features from minimal map information to ensure efficient and reliable MP. These features represent interpretable information as the driveable area and plausible goal points, in opposition to black-box CNN-based methods for map processing.
The paper presents an efficient real-time scheduling algorithm for intelligent real-time edge services, defined as those that perform machine intelligence tasks, such as voice recognition, LIDAR processing, or machine vision, on behalf of local embedded devices that are themselves unable to support extensive computations. The work contributes to a recent direction in real-time computing that develops scheduling algorithms for machine intelligence tasks with anytime prediction. We show that deep neural network workflows can be cast as imprecise computations, each with a mandatory part and (several) optional parts whose execution utility depends on input data. The goal of the real-time scheduler is to maximize the average accuracy of deep neural network outputs while meeting task deadlines, thanks to opportunistic shedding of the least necessary optional parts. The work is motivated by the proliferation of increasingly ubiquitous but resource-constrained embedded devices (for applications ranging from autonomous cars to the Internet of Things) and the desire to develop services that endow them with intelligence. Experiments on recent GPU hardware and a state of the art deep neural network for machine vision illustrate that our scheme can increase the overall accuracy by 10%-20% while incurring (nearly) no deadline misses.
Safety and tracking stability are crucial for safety-critical systems such as self-driving cars, autonomous mobile robots, industrial manipulators, etc. To efficiently control safety-critical systems to ensure their safety and achieve tracking stability, accurate system dynamic models are usually required. However, accurate system models are not always available in practice. In this paper, a learning-based safety-stability-driven control (LBSC) algorithm is presented to guarantee safety and tracking stability for nonlinear safety-critical systems subject to control input constraints under model uncertainties. Gaussian Processes (GPs) are employed to learn the model error between the nominal model and the actual system dynamics, and the estimated mean and variance of the model error are used to quantify a high-confidence uncertainty bound. Using this estimated uncertainty bound, a safety barrier constraint is devised to ensure safety, and a stability constraint is developed to achieve rapid and accurate tracking. Then the proposed LBSC method is formulated as a quadratic program incorporating the safety barrier, the stability constraint and the control constraints. The effectiveness of the LBSC method is illustrated on a safety-critical connected cruise control (CCC) system simulatior under model uncertainties.
During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data cripples the models' performance. Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach.
Studies of object detection and localization, particularly pedestrian detection have received considerable attention in recent times due to its several prospective applications such as surveillance, driving assistance, autonomous cars, etc. Also, a significant trend of latest research studies in related problem areas is the use of sophisticated Deep Learning based approaches to improve the benchmark performance on various standard datasets. A trade-off between the speed (number of video frames processed per second) and detection accuracy has often been reported in the existing literature. In this article, we present a new but simple deep learning based strategy for pedestrian detection that improves this trade-off. Since training of similar models using publicly available sample datasets failed to improve the detection performance to some significant extent, particularly for the instances of pedestrians of smaller sizes, we have developed a new sample dataset consisting of more than 80K annotated pedestrian figures in videos recorded under varying traffic conditions. Performance of the proposed model on the test samples of the new dataset and two other existing datasets, namely Caltech Pedestrian Dataset (CPD) and CityPerson Dataset (CD) have been obtained. Our proposed system shows nearly 16\% improvement over the existing state-of-the-art result.
Accurate localization on autonomous driving cars is essential for autonomy and driving safety, especially for complex urban streets and search-and-rescue subterranean environments where high-accurate GPS is not available. However current odometry estimation may introduce the drifting problems in long-term navigation without robust global localization. The main challenges involve scene divergence under the interference of dynamic environments and effective perception of observation and object layout variance from different viewpoints. To tackle these challenges, we present PSE-Match, a viewpoint-free place recognition method based on parallel semantic analysis of isolated semantic attributes from 3D point-cloud models. Compared with the original point cloud, the observed variance of semantic attributes is smaller. PSE-Match incorporates a divergence place learning network to capture different semantic attributes parallelly through the spherical harmonics domain. Using both existing benchmark datasets and two in-field collected datasets, our experiments show that the proposed method achieves above 70% average recall with top one retrieval and above 95% average recall with top ten retrieval cases. And PSE-Match has also demonstrated an obvious generalization ability with a limited training dataset.
Deep Neural Networks (DNN) applications are increasingly becoming a part of our everyday life, from medical applications to autonomous cars. Traditional validation of DNN relies on accuracy measures, however, the existence of adversarial examples has highlighted the limitations of these accuracy measures, raising concerns especially when DNN are integrated into safety-critical systems. In this paper, we present HOMRS, an approach to boost metamorphic testing by automatically building a small optimized set of high order metamorphic relations from an initial set of elementary metamorphic relations. HOMRS' backbone is a multi-objective search; it exploits ideas drawn from traditional systems testing such as code coverage, test case, and path diversity. We applied HOMRS to LeNet5 DNN with MNIST dataset and we report evidence that it builds a small but effective set of high order transformations achieving a 95% kill ratio. Five raters manually labeled a pool of images before and after high order transformation; Fleiss' Kappa and statistical tests confirmed that they are metamorphic properties. HOMRS built-in relations are also effective to confront adversarial or out-of-distribution examples; HOMRS detected 92% of randomly sampled out-of-distribution images. HOMRS transformations are also suitable for online real-time use.
3D LiDAR (light detection and ranging) based semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics. For example, for autonomous cars equipped with RGB cameras and LiDAR, it is crucial to fuse complementary information from different sensors for robust and accurate segmentation. Existing fusion-based methods, however, may not achieve promising performance due to the vast difference between two modalities. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities, namely, appearance information from RGB images and spatio-depth information from point clouds. To this end, we first project point clouds to the camera coordinates to provide spatio-depth information for RGB images. Then, we propose a two-stream network to extract features from the two modalities, separately, and fuse the features by effective residual-based fusion modules. Moreover, we propose additional perception-aware losses to measure the great perceptual difference between the two modalities. Extensive experiments on two benchmark data sets show the superiority of our method. For example, on nuScenes, our PMF outperforms the state-of-the-art method by 0.8% in mIoU.
Deep learning systems are typically designed to perform for a wide range of test inputs. For example, deep learning systems in autonomous cars are supposed to deal with traffic situations for which they were not specifically trained. In general, the ability to cope with a broad spectrum of unseen test inputs is called generalization. Generalization is definitely important in applications where the possible test inputs are known but plentiful or simply unknown, but there are also cases where the possible inputs are few and unlabeled but known beforehand. For example, medicine is currently interested in targeting treatments to individual patients; the number of patients at any given time is usually small (typically one), their diagnoses/responses/... are still unknown, but their general characteristics (such as genome information, protein levels in the blood, and so forth) are known before the treatment. We propose to call deep learning in such applications targeted deep learning. In this paper, we introduce a framework for targeted deep learning, and we devise and test an approach for adapting standard pipelines to the requirements of targeted deep learning. The approach is very general yet easy to use: it can be implemented as a simple data-preprocessing step. We demonstrate on a variety of real-world data that our approach can indeed render standard deep learning faster and more accurate when the test inputs are known beforehand.