Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zoubin Ghahramani

Variational Bayesian dropout: pitfalls and fixes

Jul 05, 2018
Jiri Hron, Alexander G. de G. Matthews, Zoubin Ghahramani

Figure 1 for Variational Bayesian dropout: pitfalls and fixes

Figure 2 for Variational Bayesian dropout: pitfalls and fixes

Figure 3 for Variational Bayesian dropout: pitfalls and fixes

Figure 4 for Variational Bayesian dropout: pitfalls and fixes

Dropout, a stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a specific type of approximate inference algorithm for Bayesian neural networks. The main contribution of the reinterpretation is in providing a theoretical framework useful for analysing and extending the algorithm. We show that the proposed framework suffers from several issues; from undefined or pathological behaviour of the true posterior related to use of improper priors, to an ill-defined variational objective due to singularity of the approximating distribution relative to the true posterior. Our analysis of the improper log uniform prior used in variational Gaussian dropout suggests the pathologies are generally irredeemable, and that the algorithm still works only because the variational formulation annuls some of the pathologies. To address the singularity issue, we proffer Quasi-KL (QKL) divergence, a new approximate inference objective for approximation of high-dimensional distributions. We show that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit. Properties of QKL are studied both theoretically and on a simple practical example which shows that the QKL-optimal approximation of a full rank Gaussian with a degenerate one naturally leads to the Principal Component Analysis solution.

* Extended version of the paper accepted to ICML 2018: more details in the proofs, few minor modifications

Via

Access Paper or Ask Questions

Probabilistic Deep Learning using Random Sum-Product Networks

Jun 22, 2018
Robert Peharz, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Kristian Kersting, Zoubin Ghahramani

Figure 1 for Probabilistic Deep Learning using Random Sum-Product Networks

Figure 2 for Probabilistic Deep Learning using Random Sum-Product Networks

Figure 3 for Probabilistic Deep Learning using Random Sum-Product Networks

Figure 4 for Probabilistic Deep Learning using Random Sum-Product Networks

The need for consistent treatment of uncertainty has recently triggered increased interest in probabilistic deep learning methods. However, most current approaches have severe limitations when it comes to inference, since many of these models do not even permit to evaluate exact data likelihoods. Sum-product networks (SPNs), on the other hand, are an excellent architecture in that regard, as they allow to efficiently evaluate likelihoods, as well as arbitrary marginalization and conditioning tasks. Nevertheless, SPNs have not been fully explored as serious deep learning models, likely due to their special structural requirements, which complicate learning. In this paper, we make a drastic simplification and use random SPN structures which are trained in a "classical deep learning manner", i.e. employing automatic differentiation, SGD, and GPU support. The resulting models, called RAT-SPNs, yield prediction results comparable to deep neural networks, while still being interpretable as generative model and maintaining well-calibrated uncertainties. This property makes them highly robust under missing input features and enables them to naturally detect outliers and peculiar samples.

Via

Access Paper or Ask Questions

The Mirage of Action-Dependent Baselines in Reinforcement Learning

Apr 06, 2018
George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine

Figure 1 for The Mirage of Action-Dependent Baselines in Reinforcement Learning

Figure 2 for The Mirage of Action-Dependent Baselines in Reinforcement Learning

Figure 3 for The Mirage of Action-Dependent Baselines in Reinforcement Learning

Figure 4 for The Mirage of Action-Dependent Baselines in Reinforcement Learning

Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To better understand this development, we decompose the variance of the policy gradient estimator and numerically show that learned state-action-dependent baselines do not in fact reduce variance over a state-dependent baseline in commonly tested benchmark domains. We confirm this unexpected result by reviewing the open-source code accompanying these prior papers, and show that subtle implementation decisions cause deviations from the methods presented in the papers and explain the source of the previously observed empirical gains. Furthermore, the variance decomposition highlights areas for improvement, which we demonstrate by illustrating a simple change to the typical value function parameterization that can significantly improve performance.

* Updated to address comments from ICLR workshop reviewers

Via

Access Paper or Ask Questions

General Latent Feature Models for Heterogeneous Datasets

Mar 08, 2018
Isabel Valera, Melanie F. Pradier, Maria Lomeli, Zoubin Ghahramani

Figure 1 for General Latent Feature Models for Heterogeneous Datasets

Figure 2 for General Latent Feature Models for Heterogeneous Datasets

Figure 3 for General Latent Feature Models for Heterogeneous Datasets

Figure 4 for General Latent Feature Models for Heterogeneous Datasets

Latent feature modeling allows capturing the latent structure responsible for generating the observed properties of a set of objects. It is often used to make predictions either for new values of interest or missing information in the original data, as well as to perform data exploratory analysis. However, although there is an extensive literature on latent feature models for homogeneous datasets, where all the attributes that describe each object are of the same (continuous or discrete) nature, there is a lack of work on latent feature modeling for heterogeneous databases. In this paper, we introduce a general Bayesian nonparametric latent feature model suitable for heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while keeping the properties of conjugate models, which allow us to infer the model in linear time with respect to the number of objects and attributes. Second, its Bayesian nonparametric nature allows us to automatically infer the model complexity from the data, i.e., the number of features necessary to capture the latent structure in the data. Third, the latent features in the model are binary-valued variables, easing the interpretability of the obtained latent features in data exploratory analysis. We show the flexibility of the proposed model by solving both prediction and data analysis tasks on several real-world datasets. Moreover, a software package of the GLFM is publicly available for other researcher to use and improve it.

* Software library available at https://github.com/ivaleraM/GLFM

Via

Access Paper or Ask Questions

Weakly supervised collective feature learning from curated media

Feb 13, 2018
Yusuke Mukuta, Akisato Kimura, David B Adrian, Zoubin Ghahramani

Figure 1 for Weakly supervised collective feature learning from curated media

Figure 2 for Weakly supervised collective feature learning from curated media

Figure 3 for Weakly supervised collective feature learning from curated media

Figure 4 for Weakly supervised collective feature learning from curated media

The current state-of-the-art in feature learning relies on the supervised learning of large-scale datasets consisting of target content items and their respective category labels. However, constructing such large-scale fully-labeled datasets generally requires painstaking manual effort. One possible solution to this problem is to employ community contributed text tags as weak labels, however, the concepts underlying a single text tag strongly depends on the users. We instead present a new paradigm for learning discriminative features by making full use of the human curation process on social networking services (SNSs). During the process of content curation, SNS users collect content items manually from various sources and group them by context, all for their own benefit. Due to the nature of this process, we can assume that (1) content items in the same group share the same semantic concept and (2) groups sharing the same images might have related semantic concepts. Through these insights, we can define human curated groups as weak labels from which our proposed framework can learn discriminative features as a representation in the space of semantic concepts the users intended when creating the groups. We show that this feature learning can be formulated as a problem of link prediction for a bipartite graph whose nodes corresponds to content items and human curated groups, and propose a novel method for feature learning based on sparse coding or network fine-tuning.

* Published in the Proceedings of AAAI Conferenrence on Artificial Intelligence (AAAI2018)

Via

Access Paper or Ask Questions

Variational Gaussian Dropout is not Bayesian

Nov 08, 2017
Jiri Hron, Alexander G. de G. Matthews, Zoubin Ghahramani

Gaussian multiplicative noise is commonly used as a stochastic regularisation technique in training of deterministic neural networks. A recent paper reinterpreted the technique as a specific algorithm for approximate inference in Bayesian neural networks; several extensions ensued. We show that the log-uniform prior used in all the above publications does not generally induce a proper posterior, and thus Bayesian inference in such models is ill-posed. Independent of the log-uniform prior, the correlated weight noise approximation has further issues leading to either infinite objective or high risk of overfitting. The above implies that the reported sparsity of obtained solutions cannot be explained by Bayesian or the related minimum description length arguments. We thus study the objective from a non-Bayesian perspective, provide its previously unknown analytical form which allows exact gradient evaluation, and show that the later proposed additive reparametrisation introduces minima not present in the original multiplicative parametrisation. Implications and future research directions are discussed.

Via

Access Paper or Ask Questions

Magnetic Hamiltonian Monte Carlo

Aug 19, 2017
Nilesh Tripuraneni, Mark Rowland, Zoubin Ghahramani, Richard Turner

Figure 1 for Magnetic Hamiltonian Monte Carlo

Figure 2 for Magnetic Hamiltonian Monte Carlo

Figure 3 for Magnetic Hamiltonian Monte Carlo

Figure 4 for Magnetic Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) exploits Hamiltonian dynamics to construct efficient proposals for Markov chain Monte Carlo (MCMC). In this paper, we present a generalization of HMC which exploits \textit{non-canonical} Hamiltonian dynamics. We refer to this algorithm as magnetic HMC, since in 3 dimensions a subset of the dynamics map onto the mechanics of a charged particle coupled to a magnetic field. We establish a theoretical basis for the use of non-canonical Hamiltonian dynamics in MCMC, and construct a symplectic, leapfrog-like integrator allowing for the implementation of magnetic HMC. Finally, we exhibit several examples where these non-canonical dynamics can lead to improved mixing of magnetic HMC relative to ordinary HMC.

* 34th International Conference on Machine Learning (ICML 2017)

Via

Access Paper or Ask Questions

General Latent Feature Modeling for Data Exploration Tasks

Jul 26, 2017
Isabel Valera, Melanie F. Pradier, Zoubin Ghahramani

Figure 1 for General Latent Feature Modeling for Data Exploration Tasks

Figure 2 for General Latent Feature Modeling for Data Exploration Tasks

Figure 3 for General Latent Feature Modeling for Data Exploration Tasks

Figure 4 for General Latent Feature Modeling for Data Exploration Tasks

This paper introduces a general Bayesian non- parametric latent feature model suitable to per- form automatic exploratory analysis of heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while can be inferred in linear time with respect to the number of objects and attributes. Second, its Bayesian nonparametric nature allows us to automatically infer the model complexity from the data, i.e., the number of features necessary to capture the latent structure in the data. Third, the latent features in the model are binary-valued variables, easing the interpretability of the obtained latent features in data exploration tasks.

* presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

Via

Access Paper or Ask Questions

Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes

Jul 19, 2017
Tomoharu Iwata, Zoubin Ghahramani

Figure 1 for Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes

Figure 2 for Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes

Figure 3 for Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes

Figure 4 for Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes

We propose a simple method that combines neural networks and Gaussian processes. The proposed method can estimate the uncertainty of outputs and flexibly adjust target functions where training data exist, which are advantages of Gaussian processes. The proposed method can also achieve high generalization performance for unseen input configurations, which is an advantage of neural networks. With the proposed method, neural networks are used for the mean functions of Gaussian processes. We present a scalable stochastic inference procedure, where sparse Gaussian processes are inferred by stochastic variational inference, and the parameters of neural networks and kernels are estimated by stochastic gradient descent methods, simultaneously. We use two real-world spatio-temporal data sets to demonstrate experimentally that the proposed method achieves better uncertainty estimation and generalization performance than neural networks and Gaussian processes.

Via

Access Paper or Ask Questions

One-Shot Learning in Discriminative Neural Networks

Jul 18, 2017
Jordan Burgess, James Robert Lloyd, Zoubin Ghahramani

Figure 1 for One-Shot Learning in Discriminative Neural Networks

Figure 2 for One-Shot Learning in Discriminative Neural Networks

We consider the task of one-shot learning of visual categories. In this paper we explore a Bayesian procedure for updating a pretrained convnet to classify a novel image category for which data is limited. We decompose this convnet into a fixed feature extractor and softmax classifier. We assume that the target weights for the new task come from the same distribution as the pretrained softmax weights, which we model as a multivariate Gaussian. By using this as a prior for the new weights, we demonstrate competitive performance with state-of-the-art methods whilst also being consistent with 'normal' methods for training deep networks on large data.

* 3 pages, 3 figures

Via

Access Paper or Ask Questions