Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alex Kendall

Urban Driving with Conditional Imitation Learning

Dec 05, 2019

Jeffrey Hawke, Richard Shen, Corina Gurau, Siddharth Sharma, Daniele Reda, Nikolay Nikolov, Przemyslaw Mazur, Sean Micklethwaite, Nicolas Griffiths, Amar Shah(+1 more)

Figure 1 for Urban Driving with Conditional Imitation Learning

Figure 2 for Urban Driving with Conditional Imitation Learning

Figure 3 for Urban Driving with Conditional Imitation Learning

Figure 4 for Urban Driving with Conditional Imitation Learning

Abstract:Hand-crafting generalised decision-making rules for real-world urban autonomous driving is hard. Alternatively, learning behaviour from easy-to-collect human driving demonstrations is appealing. Prior work has studied imitation learning (IL) for autonomous driving with a number of limitations. Examples include only performing lane-following rather than following a user-defined route, only using a single camera view or heavily cropped frames lacking state observability, only lateral (steering) control, but not longitudinal (speed) control and a lack of interaction with traffic. Importantly, the majority of such systems have been primarily evaluated in simulation - a simple domain, which lacks real-world complexities. Motivated by these challenges, we focus on learning representations of semantics, geometry and motion with computer vision for IL from human driving demonstrations. As our main contribution, we present an end-to-end conditional imitation learning approach, combining both lateral and longitudinal control on a real vehicle for following urban routes with simple traffic. We address inherent dataset bias by data balancing, training our final policy on approximately 30 hours of demonstrations gathered over six months. We evaluate our method on an autonomous vehicle by driving 35km of novel routes in European urban streets.

* Under submission; added acknowledgements

Via

Access Paper or Ask Questions

Learning to Drive from Simulation without Real World Labels

Dec 13, 2018

Alex Bewley, Jessica Rigley, Yuxuan Liu, Jeffrey Hawke, Richard Shen, Vinh-Dieu Lam, Alex Kendall

Figure 1 for Learning to Drive from Simulation without Real World Labels

Figure 2 for Learning to Drive from Simulation without Real World Labels

Figure 3 for Learning to Drive from Simulation without Real World Labels

Figure 4 for Learning to Drive from Simulation without Real World Labels

Abstract:Simulation can be a powerful tool for understanding machine learning systems and designing methods to solve real-world problems. Training and evaluating methods purely in simulation is often "doomed to succeed" at the desired task in a simulated environment, but the resulting models are incapable of operation in the real world. Here we present and evaluate a method for transferring a vision-based lane following driving policy from simulation to operation on a rural road without any real-world labels. Our approach leverages recent advances in image-to-image translation to achieve domain transfer while jointly learning a single-camera control policy from simulation control labels. We assess the driving performance of this method using both open-loop regression metrics, and closed-loop performance operating an autonomous vehicle on rural and urban roads.

Via

Access Paper or Ask Questions

Orthographic Feature Transform for Monocular 3D Object Detection

Nov 20, 2018

Thomas Roddick, Alex Kendall, Roberto Cipolla

Figure 1 for Orthographic Feature Transform for Monocular 3D Object Detection

Figure 2 for Orthographic Feature Transform for Monocular 3D Object Detection

Figure 3 for Orthographic Feature Transform for Monocular 3D Object Detection

Figure 4 for Orthographic Feature Transform for Monocular 3D Object Detection

Abstract:3D object detection from monocular images has proven to be an enormously challenging task, with the performance of leading systems not yet achieving even 10\% of that of LiDAR-based counterparts. One explanation for this performance gap is that existing systems are entirely at the mercy of the perspective image-based representation, in which the appearance and scale of objects varies drastically with depth and meaningful distances are difficult to infer. In this work we argue that the ability to reason about the world in 3D is an essential element of the 3D object detection task. To this end, we introduce the orthographic feature transform, which enables us to escape the image domain by mapping image-based features into an orthographic 3D space. This allows us to reason holistically about the spatial configuration of the scene in a domain where scale is consistent and distances between objects are meaningful. We apply this transformation as part of an end-to-end deep learning architecture and achieve state-of-the-art performance on the KITTI 3D object benchmark.\footnote{We will release full source code and pretrained models upon acceptance of this manuscript for publication.

Via

Access Paper or Ask Questions

Learning to Drive in a Day

Sep 11, 2018

Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, Amar Shah

Abstract:We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision. We discuss the challenges and opportunities to scale this approach to a broader range of autonomous driving tasks.

* Further results and demo videos can be viewed at: https://wayve.ai/blog/l2diad

Via

Access Paper or Ask Questions

Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

Apr 24, 2018

Alex Kendall, Yarin Gal, Roberto Cipolla

Figure 1 for Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

Figure 2 for Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

Figure 3 for Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

Figure 4 for Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

Abstract:Numerous deep learning applications benefit from multi-task learning with multiple regression and classification objectives. In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task's loss. Tuning these weights by hand is a difficult and expensive process, making multi-task learning prohibitive in practice. We propose a principled approach to multi-task deep learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. This allows us to simultaneously learn various quantities with different units or scales in both classification and regression settings. We demonstrate our model learning per-pixel depth regression, semantic and instance segmentation from a monocular input image. Perhaps surprisingly, we show our model can learn multi-task weightings and outperform separate models trained individually on each task.

* CVPR 2018

Via

Access Paper or Ask Questions

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

Oct 05, 2017

Alex Kendall, Yarin Gal

Figure 1 for What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

Figure 2 for What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

Figure 3 for What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

Figure 4 for What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

Abstract:There are two major types of uncertainty one can model. Aleatoric uncertainty captures noise inherent in the observations. On the other hand, epistemic uncertainty accounts for uncertainty in the model -- uncertainty which can be explained away given enough data. Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible. We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks. For this we present a Bayesian deep learning framework combining input-dependent aleatoric uncertainty together with epistemic uncertainty. We study models under the framework with per-pixel semantic segmentation and depth regression tasks. Further, our explicit uncertainty formulation leads to new loss functions for these tasks, which can be interpreted as learned attenuation. This makes the loss more robust to noisy data, also giving new state-of-the-art results on segmentation and depth regression benchmarks.

* NIPS 2017

Via

Access Paper or Ask Questions

Geometric Loss Functions for Camera Pose Regression with Deep Learning

May 23, 2017

Alex Kendall, Roberto Cipolla

Figure 1 for Geometric Loss Functions for Camera Pose Regression with Deep Learning

Figure 2 for Geometric Loss Functions for Camera Pose Regression with Deep Learning

Figure 3 for Geometric Loss Functions for Camera Pose Regression with Deep Learning

Figure 4 for Geometric Loss Functions for Camera Pose Regression with Deep Learning

Abstract:Deep learning has shown to be effective for robust and real-time monocular image relocalisation. In particular, PoseNet is a deep convolutional neural network which learns to regress the 6-DOF camera pose from a single image. It learns to localize using high level features and is robust to difficult lighting, motion blur and unknown camera intrinsics, where point based SIFT registration fails. However, it was trained using a naive loss function, with hyper-parameters which require expensive tuning. In this paper, we give the problem a more fundamental theoretical treatment. We explore a number of novel loss functions for learning camera pose which are based on geometry and scene reprojection error. Additionally we show how to automatically learn an optimal weighting to simultaneously regress position and orientation. By leveraging geometry, we demonstrate that our technique significantly improves PoseNet's performance across datasets ranging from indoor rooms to a small city.

* CVPR 2017

Via

Access Paper or Ask Questions

Concrete Dropout

May 22, 2017

Yarin Gal, Jiri Hron, Alex Kendall

Abstract:Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary - a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout's discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed. We analyse the proposed variant extensively on a range of tasks, and give insights into common practice in the field where larger dropout probabilities are often used in deeper model layers.

Via

Access Paper or Ask Questions

End-to-End Learning of Geometry and Context for Deep Stereo Regression

Mar 13, 2017

Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, Adam Bry

Figure 1 for End-to-End Learning of Geometry and Context for Deep Stereo Regression

Figure 2 for End-to-End Learning of Geometry and Context for Deep Stereo Regression

Figure 3 for End-to-End Learning of Geometry and Context for Deep Stereo Regression

Figure 4 for End-to-End Learning of Geometry and Context for Deep Stereo Regression

Abstract:We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem's geometry to form a cost volume using deep feature representations. We learn to incorporate contextual information using 3-D convolutions over this volume. Disparity values are regressed from the cost volume using a proposed differentiable soft argmin operation, which allows us to train our method end-to-end to sub-pixel accuracy without any additional post-processing or regularization. We evaluate our method on the Scene Flow and KITTI datasets and on KITTI we set a new state-of-the-art benchmark, while being significantly faster than competing approaches.

Via

Access Paper or Ask Questions

Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

Oct 10, 2016

Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla

Figure 1 for Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

Figure 2 for Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

Figure 3 for Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

Figure 4 for Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

Abstract:We present a deep learning framework for probabilistic pixel-wise semantic segmentation, which we term Bayesian SegNet. Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making. Our contribution is a practical system which is able to predict pixel-wise class labels with a measure of model uncertainty. We achieve this by Monte Carlo sampling with dropout at test time to generate a posterior distribution of pixel class labels. In addition, we show that modelling uncertainty improves segmentation performance by 2-3% across a number of state of the art architectures such as SegNet, FCN and Dilation Network, with no additional parametrisation. We also observe a significant improvement in performance for smaller datasets where modelling uncertainty is more effective. We benchmark Bayesian SegNet on the indoor SUN Scene Understanding and outdoor CamVid driving scenes datasets.

Via

Access Paper or Ask Questions