Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Balaji Lakshminarayanan

Bayesian Deep Ensembles via the Neural Tangent Kernel

Jul 11, 2020

Bobby He, Balaji Lakshminarayanan, Yee Whye Teh

Figure 1 for Bayesian Deep Ensembles via the Neural Tangent Kernel

Figure 2 for Bayesian Deep Ensembles via the Neural Tangent Kernel

Figure 3 for Bayesian Deep Ensembles via the Neural Tangent Kernel

Figure 4 for Bayesian Deep Ensembles via the Neural Tangent Kernel

Abstract:We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs). Previous work has shown that even in the infinite width limit, when NNs become GPs, there is no GP posterior interpretation to a deep ensemble trained with squared error loss. We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member, that enables a posterior interpretation in the infinite width limit. When ensembled together, our trained NNs give an approximation to a posterior predictive distribution, and we prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit. Finally, using finite width NNs we demonstrate that our Bayesian deep ensembles faithfully emulate the analytic posterior predictive when available, and can outperform standard deep ensembles in various out-of-distribution settings, for both regression and classification tasks.

Via

Access Paper or Ask Questions

Revisiting One-vs-All Classifiers for Predictive Uncertainty and Out-of-Distribution Detection in Neural Networks

Jul 10, 2020

Shreyas Padhy, Zachary Nado, Jie Ren, Jeremiah Liu, Jasper Snoek, Balaji Lakshminarayanan

Figure 1 for Revisiting One-vs-All Classifiers for Predictive Uncertainty and Out-of-Distribution Detection in Neural Networks

Figure 2 for Revisiting One-vs-All Classifiers for Predictive Uncertainty and Out-of-Distribution Detection in Neural Networks

Figure 3 for Revisiting One-vs-All Classifiers for Predictive Uncertainty and Out-of-Distribution Detection in Neural Networks

Figure 4 for Revisiting One-vs-All Classifiers for Predictive Uncertainty and Out-of-Distribution Detection in Neural Networks

Abstract:Accurate estimation of predictive uncertainty in modern neural networks is critical to achieve well calibrated predictions and detect out-of-distribution (OOD) inputs. The most promising approaches have been predominantly focused on improving model uncertainty (e.g. deep ensembles and Bayesian neural networks) and post-processing techniques for OOD detection (e.g. ODIN and Mahalanobis distance). However, there has been relatively little investigation into how the parametrization of the probabilities in discriminative classifiers affects the uncertainty estimates, and the dominant method, softmax cross-entropy, results in misleadingly high confidences on OOD data and under covariate shift. We investigate alternative ways of formulating probabilities using (1) a one-vs-all formulation to capture the notion of "none of the above", and (2) a distance-based logit representation to encode uncertainty as a function of distance to the training manifold. We show that one-vs-all formulations can improve calibration on image classification tasks, while matching the predictive performance of softmax without incurring any additional training or test-time complexity.

Via

Access Paper or Ask Questions

Density of States Estimation for Out-of-Distribution Detection

Jun 22, 2020

Warren R. Morningstar, Cusuh Ham, Andrew G. Gallagher, Balaji Lakshminarayanan, Alexander A. Alemi, Joshua V. Dillon

Figure 1 for Density of States Estimation for Out-of-Distribution Detection

Figure 2 for Density of States Estimation for Out-of-Distribution Detection

Figure 3 for Density of States Estimation for Out-of-Distribution Detection

Figure 4 for Density of States Estimation for Out-of-Distribution Detection

Abstract:Perhaps surprisingly, recent studies have shown probabilistic model likelihoods have poor specificity for out-of-distribution (OOD) detection and often assign higher likelihoods to OOD data than in-distribution data. To ameliorate this issue we propose DoSE, the density of states estimator. Drawing on the statistical physics notion of ``density of states,'' the DoSE decision rule avoids direct comparison of model probabilities, and instead utilizes the ``probability of the model probability,'' or indeed the frequency of any reasonable statistic. The frequency is calculated using nonparametric density estimators (e.g., KDE and one-class SVM) which measure the typicality of various model statistics given the training data and from which we can flag test points with low typicality as anomalous. Unlike many other methods, DoSE requires neither labeled data nor OOD examples. DoSE is modular and can be trivially applied to any existing, trained model. We demonstrate DoSE's state-of-the-art performance against other unsupervised OOD detectors on previously established ``hard'' benchmarks.

* Submitted to NeurIPS. Corrected footnote from: "34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada" to "Preprint. Under review."

Via

Access Paper or Ask Questions

Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

Jun 17, 2020

Jeremiah Zhe Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax-Weiss, Balaji Lakshminarayanan

Figure 1 for Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

Figure 2 for Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

Figure 3 for Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

Figure 4 for Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

Abstract:Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.

Via

Access Paper or Ask Questions

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

May 14, 2020

Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-an Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran

Figure 1 for Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Figure 2 for Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Figure 3 for Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Figure 4 for Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Abstract:Bayesian neural networks (BNNs) demonstrate promising success in improving the robustness and uncertainty quantification of modern deep learning. However, they generally struggle with underfitting at scale and parameter efficiency. On the other hand, deep ensembles have emerged as alternatives for uncertainty quantification that, while outperforming BNNs on certain problems, also suffer from efficiency issues. It remains unclear how to combine the strengths of these two approaches and remediate their common issues. To tackle this challenge, we propose a rank-1 parameterization of BNNs, where each weight matrix involves only a distribution on a rank-1 subspace. We also revisit the use of mixture approximate posteriors to capture multiple modes, where unlike typical mixtures, this approach admits a significantly smaller memory increase (e.g., only a 0.4% increase for a ResNet-50 mixture of size 10). We perform a systematic empirical study on the choices of prior, variational posterior, and methods to improve training. For ResNet-50 on ImageNet, Wide ResNet 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and out-of-distribution variants.

* Code available at https://github.com/google/edward2

Via

Access Paper or Ask Questions

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Dec 05, 2019

Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan

Figure 1 for AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Figure 2 for AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Figure 3 for AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Figure 4 for AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Abstract:Modern deep neural networks can achieve high accuracy when the training distribution and test distribution are identically distributed, but this assumption is frequently violated in practice. When the train and test distributions are mismatched, accuracy can plummet. Currently there are few techniques that improve robustness to unforeseen data shifts encountered during deployment. In this work, we propose a technique to improve the robustness and uncertainty estimates of image classifiers. We propose AugMix, a data processing technique that is simple to implement, adds limited computational overhead, and helps models withstand unforeseen corruptions. AugMix significantly improves robustness and uncertainty measures on challenging image classification benchmarks, closing the gap between previous methods and the best possible performance in some cases by more than half.

* Code available at https://github.com/google-research/augmix

Via

Access Paper or Ask Questions

Normalizing Flows for Probabilistic Modeling and Inference

Dec 05, 2019

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, Balaji Lakshminarayanan

Figure 1 for Normalizing Flows for Probabilistic Modeling and Inference

Figure 2 for Normalizing Flows for Probabilistic Modeling and Inference

Figure 3 for Normalizing Flows for Probabilistic Modeling and Inference

Figure 4 for Normalizing Flows for Probabilistic Modeling and Inference

Abstract:Normalizing flows provide a general mechanism for defining expressive probability distributions, only requiring the specification of a (usually simple) base distribution and a series of bijective transformations. There has been much recent work on normalizing flows, ranging from improving their expressive power to expanding their application. We believe the field has now matured and is in need of a unified perspective. In this review, we attempt to provide such a perspective by describing flows through the lens of probabilistic modeling and inference. We place special emphasis on the fundamental principles of flow design, and discuss foundational topics such as expressive power and computational trade-offs. We also broaden the conceptual framing of flows by relating them to more general probability transformations. Lastly, we summarize the use of flows for tasks such as generative modeling, approximate inference, and supervised learning.

* Review article. 60 pages, 4 figures

Via

Access Paper or Ask Questions

Deep Ensembles: A Loss Landscape Perspective

Dec 05, 2019

Stanislav Fort, Huiyi Hu, Balaji Lakshminarayanan

Figure 1 for Deep Ensembles: A Loss Landscape Perspective

Figure 2 for Deep Ensembles: A Loss Landscape Perspective

Figure 3 for Deep Ensembles: A Loss Landscape Perspective

Figure 4 for Deep Ensembles: A Loss Landscape Perspective

Abstract:Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and out-of-distribution robustness of deep learning models. While deep ensembles were theoretically motivated by the bootstrap, non-bootstrap ensembles trained with just random initialization also perform well in practice, which suggests that there could be other explanations for why deep ensembles work well. Bayesian neural networks, which learn distributions over the parameters of the network, are theoretically well-motivated by Bayesian principles, but do not perform as well as deep ensembles in practice, particularly under dataset shift. One possible explanation for this gap between theory and practice is that popular scalable approximate Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space. We investigate this hypothesis by building on recent work on understanding the loss landscape of neural networks and adding our own exploration to measure the similarity of functions in the space of predictions. Our results show that random initializations explore entirely different modes, while functions along an optimization trajectory or sampled from the subspace thereof cluster within a single mode predictions-wise, while often deviating significantly in the weight space. We demonstrate that while low-loss connectors between modes exist, they are not connected in the space of predictions. Developing the concept of the diversity--accuracy plane, we show that the decorrelation power of random initializations is unmatched by popular subspace sampling methods.

Via

Access Paper or Ask Questions

Detecting Out-of-Distribution Inputs to Deep Generative Models Using a Test for Typicality

Jun 07, 2019

Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Balaji Lakshminarayanan

Figure 1 for Detecting Out-of-Distribution Inputs to Deep Generative Models Using a Test for Typicality

Figure 2 for Detecting Out-of-Distribution Inputs to Deep Generative Models Using a Test for Typicality

Figure 3 for Detecting Out-of-Distribution Inputs to Deep Generative Models Using a Test for Typicality

Figure 4 for Detecting Out-of-Distribution Inputs to Deep Generative Models Using a Test for Typicality

Abstract:Recent work has shown that deep generative models can assign higher likelihood to out-of-distribution data sets than to their training data. We posit that this phenomenon is caused by a mismatch between the model's typical set and its areas of high probability density. In-distribution inputs should reside in the former but not necessarily in the latter, as previous work has presumed. To determine whether or not inputs reside in the typical set, we propose a statistically principled, easy-to-implement test using the empirical distribution of model likelihoods. The test is model agnostic and widely applicable, only requiring that the likelihood can be computed or closely approximated. We report experiments showing that our procedure can successfully detect the out-of-distribution sets in several of the challenging cases reported by Nalisnick et al. (2019).

Via

Access Paper or Ask Questions

Likelihood Ratios for Out-of-Distribution Detection

Jun 07, 2019

Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark A. DePristo, Joshua V. Dillon, Balaji Lakshminarayanan

Figure 1 for Likelihood Ratios for Out-of-Distribution Detection

Figure 2 for Likelihood Ratios for Out-of-Distribution Detection

Figure 3 for Likelihood Ratios for Out-of-Distribution Detection

Figure 4 for Likelihood Ratios for Out-of-Distribution Detection

Abstract:Discriminative neural networks offer little or no performance guarantees when deployed on data not generated by the same process as the training distribution. On such out-of-distribution (OOD) inputs, the prediction may not only be erroneous, but confidently so, limiting the safe deployment of classifiers in real-world applications. One such challenging application is bacteria identification based on genomic sequences, which holds the promise of early detection of diseases, but requires a model that can output low confidence predictions on OOD genomic sequences from new bacteria that were not present in the training data. We introduce a genomics dataset for OOD detection that allows other researchers to benchmark progress on this important problem. We investigate deep generative model based approaches for OOD detection and observe that the likelihood score is heavily affected by population level background statistics. We propose a likelihood ratio method for deep generative models which effectively corrects for these confounding background statistics. We benchmark the OOD detection performance of the proposed method against existing approaches on the genomics dataset and show that our method achieves state-of-the-art performance. We demonstrate the generality of the proposed method by showing that it significantly improves OOD detection when applied to deep generative models of images.

Via

Access Paper or Ask Questions