Point clouds have grown in importance in the way computers perceive the world. From LIDAR sensors in autonomous cars and drones to the time of flight and stereo vision systems in our phones, point clouds are everywhere. Despite their ubiquity, point clouds in the real world are often missing points because of sensor limitations or occlusions, or contain extraneous points from sensor noise or artifacts. These problems challenge algorithms that require computing correspondences between a pair of point clouds. Therefore, this paper presents a fully-convolutional neural network that identifies which points in one point cloud are most similar (inliers) to the points in another. We show improvements in learning-based and classical point cloud registration approaches when retrofitted with our network. We demonstrate these improvements on synthetic and real-world datasets. Finally, our network produces impressive results on test datasets that were unseen during training, thus exhibiting generalizability. Code and videos are available at https://github.com/vinits5/masknet
Autonomous vehicles commonly rely on highly detailed birds-eye-view maps of their environment, which capture both static elements of the scene such as road layout as well as dynamic elements such as other cars and pedestrians. Generating these map representations on the fly is a complex multi-stage process which incorporates many important vision-based elements, including ground plane estimation, road segmentation and 3D object detection. In this work we present a simple, unified approach for estimating maps directly from monocular images using a single end-to-end deep learning architecture. For the maps themselves we adopt a semantic Bayesian occupancy grid framework, allowing us to trivially accumulate information over multiple cameras and timesteps. We demonstrate the effectiveness of our approach by evaluating against several challenging baselines on the NuScenes and Argoverse datasets, and show that we are able to achieve a relative improvement of 9.1% and 22.3% respectively compared to the best-performing existing method.
Despite the successful demonstration of autonomous vehicles (AVs), such as self-driving cars, ensuring AV safety remains a challenging task. Although some actors influence an AV's driving decisions more than others, current approaches pay equal attention to each actor on the road. An actor's influence on the AV's decision can be characterized in terms of its ability to decrease the number of safe navigational choices for the AV. In this work, we propose a safety threat indicator (STI) using counterfactual reasoning to estimate the importance of each actor on the road with respect to its influence on the AV's safety. We use this indicator to (i) characterize the existing real-world datasets to identify rare hazardous scenarios as well as the poor performance of existing controllers in such scenarios; and (ii) design an RL based safety mitigation controller to proactively mitigate the safety hazards those actors pose to the AV. Our approach reduces the accident rate for the state-of-the-art AV agent(s) in rare hazardous scenarios by more than 70%.
In adversarial attacks intended to confound deep learning models, most studies have focused on limiting the magnitude of the modification so that humans do not notice the attack. On the other hand, during an attack against autonomous cars, for example, most drivers would not find it strange if a small insect image were placed on a stop sign, or they may overlook it. In this paper, we present a systematic approach to generate natural adversarial examples against classification models by employing such natural-appearing perturbations that imitate a certain object or signal. We first show the feasibility of this approach in an attack against an image classifier by employing generative adversarial networks that produce image patches that have the appearance of a natural object to fool the target model. We also introduce an algorithm to optimize placement of the perturbation in accordance with the input image, which makes the generation of adversarial examples fast and likely to succeed. Moreover, we experimentally show that the proposed approach can be extended to the audio domain, for example, to generate perturbations that sound like the chirping of birds to fool a speech classifier.
Many real-world applications need path planning algorithms to solve tasks in different areas, such as social applications, autonomous cars, and tracking activities. And most importantly motion planning. Although the use of path planning is sufficient in most motion planning scenarios, they represent potential bottlenecks in large environments with dynamic changes. To tackle this problem, the number of possible routes could be reduced to make it easier for path planning algorithms to find the shortest path with less efforts. An traditional algorithm for path planning is the A*, it uses an heuristic to work faster than other solutions. In this work, we propose a CNN encoder capable of eliminating useless routes for motion planning problems, then we combine the proposed neural network output with A*. To measure the efficiency of our solution, we propose a database with different scenarios of motion planning problems. The evaluated metric is the number of the iterations to find the shortest path. The A* was compared with the CNN Encoder (proposal) with A*. In all evaluated scenarios, our solution reduced the number of iterations by more than 60\%.
One desirable capability of autonomous cars is to accurately predict the pedestrian motion near intersections for safe and efficient trajectory planning. We are interested in developing transfer learning algorithms that can be trained on the pedestrian trajectories collected at one intersection and yet still provide accurate predictions of the trajectories at another, previously unseen intersection. We first discussed the feature selection for transferable pedestrian motion models in general. Following this discussion, we developed one transferable pedestrian motion prediction algorithm based on Inverse Reinforcement Learning (IRL) that infers pedestrian intentions and predicts future trajectories based on observed trajectory. We evaluated our algorithm on a dataset collected at two intersections, trained at one intersection and tested at the other intersection. We used the accuracy of augmented semi-nonnegative sparse coding (ASNSC), trained and tested at the same intersection as a baseline. The result shows that the proposed algorithm improves the baseline accuracy by 40% in the non-transfer task, and 16% in the transfer task.
Visual Place Recognition and Visual Localization are essential components in navigation and mapping for autonomous vehicles especially in GNSS-denied navigation scenarios. Recent work has focused on ground or close to ground applications such as self-driving cars or indoor-scenarios and low-altitude drone flights. However, applications such as Urban Air Mobility require operations in large-scale outdoor environments at medium to high altitudes. We present a new dataset named VPAIR. The dataset was recorded on board a light aircraft flying at an altitude of more than 300 meters above ground capturing images with a downwardfacing camera. Each image is paired with a high resolution reference render including dense depth information and 6-DoF reference poses. The dataset covers a more than one hundred kilometers long trajectory over various types of challenging landscapes, e.g. urban, farmland and forests. Experiments on this dataset illustrate the challenges introduced by the change in perspective to a bird's eye view such as in-plane rotations.
Driving heavy-duty vehicles, such as buses and tractor-trailer vehicles, is a difficult task in comparison to passenger cars. Most research on motion planning for autonomous vehicles has focused on passenger vehicles, and many unique challenges associated with heavy-duty vehicles remain open. However, recent works have started to tackle the particular difficulties related to on-road motion planning for buses and tractor-trailer vehicles using numerical optimization approaches. In this work, we propose a framework to design an optimization objective to be used in motion planners. Based on geometric derivations, the method finds the optimal trade-off between the conflicting objectives of centering different axles of the vehicle in the lane. For the buses, we consider the front and rear axles trade-off, whereas for articulated vehicles, we consider the tractor and trailer rear axles trade-off. Our results show that the proposed design strategy results in planned paths that considerably improve the behavior of heavy-duty vehicles by keeping the whole vehicle body in the center of the lane.
Given the fast growth of intelligent devices, it is expected that a large number of high-stake artificial intelligence (AI) applications, e.g., drones, autonomous cars, tactile robots, will be deployed at the edge of wireless networks in the near future. As such, the intelligent communication networks will be designed to leverage advanced wireless techniques and edge computing technologies to support AI-enabled applications at various end devices with limited communication, computation, hardware and energy resources. In this article, we shall present the principles of efficient deployment of model inference at network edge to provide low-latency and energy-efficient AI services. This includes the wireless distributed computing framework for low-latency device distributed model inference as well as the wireless cooperative transmission strategy for energy-efficient edge cooperative model inference. The communication efficiency of edge inference systems is further improved by building up a smart radio propagation environment via intelligent reflecting surface.
Object detection and identification is a challenging area of computer vision and a fundamental requirement for autonomous cars. This project aims to jointly perform object detection of a swap-body and to find the type of swap-body by reading an ILU code using an efficient optical character recognition (OCR) method. Recent research activities have drastically improved deep learning techniques which proves to enhance the field of computer vision. Collecting enough images for training the model is a critical step towards achieving good results. The data for training were collected from different locations with maximum possible variations and the details are explained. In addition, data augmentation methods applied for training has proved to be effective in improving the performance of the trained model. Training the model achieved good results and the test results are also provided. The final model was tested with images and videos. Finally, this paper also draws attention to some of the major challenges faced during various stages of the project and the possible solutions applied.