Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Barber

University College London

Gaussian Mean Field Regularizes by Limiting Learned Information

Feb 12, 2019

Julius Kunze, Louis Kirsch, Hippolyt Ritter, David Barber

Figure 1 for Gaussian Mean Field Regularizes by Limiting Learned Information

Figure 2 for Gaussian Mean Field Regularizes by Limiting Learned Information

Figure 3 for Gaussian Mean Field Regularizes by Limiting Learned Information

Figure 4 for Gaussian Mean Field Regularizes by Limiting Learned Information

Abstract:Variational inference with a factorized Gaussian posterior estimate is a widely used approach for learning parameters and hidden variables. Empirically, a regularizing effect can be observed that is poorly understood. In this work, we show how mean field inference improves generalization by limiting mutual information between learned parameters and the data through noise. We quantify a maximum capacity when the posterior variance is either fixed or learned and connect it to generalization error, even when the KL-divergence in the objective is rescaled. Our experiments demonstrate that bounding information between parameters and data effectively regularizes neural networks on both supervised and unsupervised tasks.

Via

Access Paper or Ask Questions

Practical Lossless Compression with Latent Variables using Bits Back Coding

Jan 15, 2019

James Townsend, Tom Bird, David Barber

Figure 1 for Practical Lossless Compression with Latent Variables using Bits Back Coding

Figure 2 for Practical Lossless Compression with Latent Variables using Bits Back Coding

Figure 3 for Practical Lossless Compression with Latent Variables using Bits Back Coding

Figure 4 for Practical Lossless Compression with Latent Variables using Bits Back Coding

Abstract:Deep latent variable models have seen recent success in many data domains. Lossless compression is an application of these models which, despite having the potential to be highly useful, has yet to be implemented in a practical manner. We present `Bits Back with ANS' (BB-ANS), a scheme to perform lossless compression with latent variable models at a near optimal rate. We demonstrate this scheme by using it to compress the MNIST dataset with a variational auto-encoder model (VAE), achieving compression rates superior to standard methods with only a simple VAE. Given that the scheme is highly amenable to parallelization, we conclude that with a sufficiently high quality generative model this scheme could be used to achieve substantial improvements in compression rate with acceptable running time. We make our implementation available open source at https://github.com/bits-back/bits-back .

Via

Access Paper or Ask Questions

Spread Divergences

Dec 02, 2018

David Barber, Mingtian Zhang, Raza Habib, Thomas Bird

Abstract:For distributions p and q with different support, the divergence generally will not exist. We define a spread divergence on modified p and q and describe sufficient conditions for the existence of such a divergence. We give examples of using a spread divergence to train implicit generative models, including linear models (Principal Components Analysis and Independent Components Analysis) and non-linear models (Deep Generative Networks).

Via

Access Paper or Ask Questions

Modular Networks: Learning to Decompose Neural Computation

Nov 13, 2018

Louis Kirsch, Julius Kunze, David Barber

Figure 1 for Modular Networks: Learning to Decompose Neural Computation

Figure 2 for Modular Networks: Learning to Decompose Neural Computation

Figure 3 for Modular Networks: Learning to Decompose Neural Computation

Figure 4 for Modular Networks: Learning to Decompose Neural Computation

Abstract:Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number of parameters with a relatively small increase in resources. We propose a training algorithm that flexibly chooses neural modules based on the data to be processed. Both the decomposition and modules are learned end-to-end. In contrast to existing approaches, training does not rely on regularization to enforce diversity in module use. We apply modular networks both to image recognition and language modeling tasks, where we achieve superior performance compared to several baselines. Introspection reveals that modules specialize in interpretable contexts.

* NIPS 2018

Via

Access Paper or Ask Questions

Stochastic Variational Optimization

Sep 13, 2018

Thomas Bird, Julius Kunze, David Barber

Figure 1 for Stochastic Variational Optimization

Figure 2 for Stochastic Variational Optimization

Figure 3 for Stochastic Variational Optimization

Figure 4 for Stochastic Variational Optimization

Abstract:Variational Optimization forms a differentiable upper bound on an objective. We show that approaches such as Natural Evolution Strategies and Gaussian Perturbation, are special cases of Variational Optimization in which the expectations are approximated by Gaussian sampling. These approaches are of particular interest because they are parallelizable. We calculate the approximate bias and variance of the corresponding gradient estimators and demonstrate that using antithetic sampling or a baseline is crucial to mitigate their problems. We contrast these methods with an alternative parallelizable method, namely Directional Derivatives. We conclude that, for differentiable objectives, using Directional Derivatives is preferable to using Variational Optimization to perform parallel Stochastic Gradient Descent.

Via

Access Paper or Ask Questions

Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

Sep 10, 2018

Zhen He, Jian Li, Daxue Liu, Hangen He, David Barber

Figure 1 for Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

Figure 2 for Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

Figure 3 for Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

Figure 4 for Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

Abstract:Online Multi-Object Tracking (MOT) from videos is a challenging computer vision task which has been extensively studied for decades. Most of the existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm combined with popular machine learning approaches which largely reduce the human effort to tune algorithm parameters. However, the commonly used supervised learning approaches require the labeled data (e.g., bounding boxes), which is expensive for videos. Also, the TBD framework is usually suboptimal since it is not end-to-end, i.e., it considers the task as detection and tracking, but not jointly. To achieve both label-free and end-to-end learning of MOT, we propose a Tracking-by-Animation framework, where a differentiable neural model first tracks objects from input frames and then animates these objects into reconstructed frames. Learning is then driven by the reconstruction error through backpropagation. We further propose a Reprioritized Attentive Tracking to improve the robustness of data association. Experiments conducted on both synthetic and real video datasets show the potential of the proposed model.

* Submitted to AAAI 2019

Via

Access Paper or Ask Questions

Generative Neural Machine Translation

Jun 13, 2018

Harshil Shah, David Barber

Figure 1 for Generative Neural Machine Translation

Figure 2 for Generative Neural Machine Translation

Figure 3 for Generative Neural Machine Translation

Figure 4 for Generative Neural Machine Translation

Abstract:We introduce Generative Neural Machine Translation (GNMT), a latent variable architecture which is designed to model the semantics of the source and target sentences. We modify an encoder-decoder translation model by adding a latent variable as a language agnostic representation which is encouraged to learn the meaning of the sentence. GNMT achieves competitive BLEU scores on pure translation tasks, and is superior when there are missing words in the source sentence. We augment the model to facilitate multilingual translation and semi-supervised learning without adding parameters. This framework significantly reduces overfitting when there is limited paired data available, and is effective for translating between pairs of languages not seen during training.

Via

Access Paper or Ask Questions

Generating Sentences Using a Dynamic Canvas

Jun 13, 2018

Harshil Shah, Bowen Zheng, David Barber

Figure 1 for Generating Sentences Using a Dynamic Canvas

Figure 2 for Generating Sentences Using a Dynamic Canvas

Figure 3 for Generating Sentences Using a Dynamic Canvas

Figure 4 for Generating Sentences Using a Dynamic Canvas

Abstract:We introduce the Attentive Unsupervised Text (W)riter (AUTR), which is a word level generative model for natural language. It uses a recurrent neural network with a dynamic attention and canvas memory mechanism to iteratively construct sentences. By viewing the state of the memory at intermediate stages and where the model is placing its attention, we gain insight into how it constructs sentences. We demonstrate that AUTR learns a meaningful latent representation for each sentence, and achieves competitive log-likelihood lower bounds whilst being computationally efficient. It is effective at generating and reconstructing sentences, as well as imputing missing words.

* AAAI 2018

Via

Access Paper or Ask Questions

Improving latent variable descriptiveness with AutoGen

Jun 12, 2018

Alex Mansbridge, Roberto Fierimonte, Ilya Feige, David Barber

Figure 1 for Improving latent variable descriptiveness with AutoGen

Figure 2 for Improving latent variable descriptiveness with AutoGen

Figure 3 for Improving latent variable descriptiveness with AutoGen

Figure 4 for Improving latent variable descriptiveness with AutoGen

Abstract:Powerful generative models, particularly in Natural Language Modelling, are commonly trained by maximizing a variational lower bound on the data log likelihood. These models often suffer from poor use of their latent variable, with ad-hoc annealing factors used to encourage retention of information in the latent variable. We discuss an alternative and general approach to latent variable modelling, based on an objective that combines the data log likelihood as well as the likelihood of a perfect reconstruction through an autoencoder. Tying these together ensures by design that the latent variable captures information about the observations, whilst retaining the ability to generate well. Interestingly, though this approach is a priori unrelated to VAEs, the lower bound attained is identical to the standard VAE bound but with the addition of a simple pre-factor; thus, providing a formal interpretation of the commonly used, ad-hoc pre-factors in training VAEs.

* 8 pages, 2 figures, 5 tables

Via

Access Paper or Ask Questions

Gaussian mixture models with Wasserstein distance

Jun 12, 2018

Benoit Gaujac, Ilya Feige, David Barber

Figure 1 for Gaussian mixture models with Wasserstein distance

Figure 2 for Gaussian mixture models with Wasserstein distance

Figure 3 for Gaussian mixture models with Wasserstein distance

Figure 4 for Gaussian mixture models with Wasserstein distance

Abstract:Generative models with both discrete and continuous latent variables are highly motivated by the structure of many real-world data sets. They present, however, subtleties in training often manifesting in the discrete latent being under leveraged. In this paper, we show that such models are more amenable to training when using the Optimal Transport framework of Wasserstein Autoencoders. We find our discrete latent variable to be fully leveraged by the model when trained, without any modifications to the objective function or significant fine tuning. Our model generates comparable samples to other approaches while using relatively simple neural networks, since the discrete latent variable carries much of the descriptive burden. Furthermore, the discrete latent provides significant control over generation.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions