Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ole Winther

Sequential Neural Models with Stochastic Layers

Nov 13, 2016

Marco Fraccaro, Søren Kaae Sønderby, Ulrich Paquet, Ole Winther

Figure 1 for Sequential Neural Models with Stochastic Layers

Figure 2 for Sequential Neural Models with Stochastic Layers

Figure 3 for Sequential Neural Models with Stochastic Layers

Figure 4 for Sequential Neural Models with Stochastic Layers

Abstract:How can we efficiently propagate uncertainty in a latent state representation with recurrent neural networks? This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model's posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.

* NIPS 2016

Via

Access Paper or Ask Questions

Neural Machine Translation with Characters and Hierarchical Encoding

Oct 20, 2016

Alexander Rosenberg Johansen, Jonas Meinertz Hansen, Elias Khazen Obeid, Casper Kaae Sønderby, Ole Winther

Figure 1 for Neural Machine Translation with Characters and Hierarchical Encoding

Figure 2 for Neural Machine Translation with Characters and Hierarchical Encoding

Figure 3 for Neural Machine Translation with Characters and Hierarchical Encoding

Figure 4 for Neural Machine Translation with Characters and Hierarchical Encoding

Abstract:Most existing Neural Machine Translation models use groups of characters or whole words as their unit of input and output. We propose a model with a hierarchical char2word encoder, that takes individual characters both as input and output. We first argue that this hierarchical representation of the character encoder reduces computational complexity, and show that it improves translation performance. Secondly, by qualitatively studying attention plots from the decoder we find that the model learns to compress common words into a single embedding whereas rare words, such as names and places, are represented character by character.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions

Self-Averaging Expectation Propagation

Aug 23, 2016

Burak Çakmak, Manfred Opper, Bernard H. Fleury, Ole Winther

Figure 1 for Self-Averaging Expectation Propagation

Figure 2 for Self-Averaging Expectation Propagation

Abstract:We investigate the problem of approximate Bayesian inference for a general class of observation models by means of the expectation propagation (EP) framework for large systems under some statistical assumptions. Our approach tries to overcome the numerical bottleneck of EP caused by the inversion of large matrices. Assuming that the measurement matrices are realizations of specific types of ensembles we use the concept of freeness from random matrix theory to show that the EP cavity variances exhibit an asymptotic self-averaging property. They can be pre-computed using specific generating functions, i.e. the R- and/or S-transforms in free probability, which do not require matrix inversions. Our approach extends the framework of (generalized) approximate message passing -- assumes zero-mean iid entries of the measurement matrix -- to a general class of random matrix ensembles. The generalization is via a simple formulation of the R- and/or S-transforms of the limiting eigenvalue distribution of the Gramian of the measurement matrix. We demonstrate the performance of our approach on a signal recovery problem of nonlinear compressed sensing and compare it with that of EP.

* 12 pages

Via

Access Paper or Ask Questions

An Adaptive Resample-Move Algorithm for Estimating Normalizing Constants

Aug 15, 2016

Marco Fraccaro, Ulrich Paquet, Ole Winther

Figure 1 for An Adaptive Resample-Move Algorithm for Estimating Normalizing Constants

Figure 2 for An Adaptive Resample-Move Algorithm for Estimating Normalizing Constants

Figure 3 for An Adaptive Resample-Move Algorithm for Estimating Normalizing Constants

Figure 4 for An Adaptive Resample-Move Algorithm for Estimating Normalizing Constants

Abstract:The estimation of normalizing constants is a fundamental step in probabilistic model comparison. Sequential Monte Carlo methods may be used for this task and have the advantage of being inherently parallelizable. However, the standard choice of using a fixed number of particles at each iteration is suboptimal because some steps will contribute disproportionately to the variance of the estimate. We introduce an adaptive version of the Resample-Move algorithm, in which the particle set is adaptively expanded whenever a better approximation of an intermediate distribution is needed. The algorithm builds on the expression for the optimal number of particles and the corresponding minimum variance found under ideal conditions. Benchmark results on challenging Gaussian Process Classification and Restricted Boltzmann Machine applications show that Adaptive Resample-Move (ARM) estimates the normalizing constant with a smaller variance, using less computational resources, than either Resample-Move with a fixed number of particles or Annealed Importance Sampling. A further advantage over Annealed Importance Sampling is that ARM is easier to tune.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

Auxiliary Deep Generative Models

Jun 16, 2016

Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, Ole Winther

Figure 1 for Auxiliary Deep Generative Models

Figure 2 for Auxiliary Deep Generative Models

Figure 3 for Auxiliary Deep Generative Models

Figure 4 for Auxiliary Deep Generative Models

Abstract:Deep generative models parameterized by neural networks have recently achieved state-of-the-art performance in unsupervised and semi-supervised learning. We extend deep generative models with auxiliary variables which improves the variational approximation. The auxiliary variables leave the generative model unchanged but make the variational distribution more expressive. Inspired by the structure of the auxiliary variable we also propose a model with two stochastic layers and skip connections. Our findings suggest that more expressive and properly specified deep generative models converge faster with better results. We show state-of-the-art performance within semi-supervised learning on MNIST, SVHN and NORB datasets.

* Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016, JMLR: Workshop and Conference Proceedings volume 48, Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016

Via

Access Paper or Ask Questions

Ladder Variational Autoencoders

May 27, 2016

Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, Ole Winther

Figure 1 for Ladder Variational Autoencoders

Figure 2 for Ladder Variational Autoencoders

Figure 3 for Ladder Variational Autoencoders

Figure 4 for Ladder Variational Autoencoders

Abstract:Variational Autoencoders are powerful models for unsupervised learning. However deep models with several layers of dependent stochastic variables are difficult to train which limits the improvements obtained using these highly expressive models. We propose a new inference model, the Ladder Variational Autoencoder, that recursively corrects the generative distribution by a data dependent approximate likelihood in a process resembling the recently proposed Ladder Network. We show that this model provides state of the art predictive log-likelihood and tighter log-likelihood lower bound compared to the purely bottom-up inference in layered Variational Autoencoders and other generative models. We provide a detailed analysis of the learned hierarchical latent representation and show that our new inference model is qualitatively different and utilizes a deeper more distributed hierarchy of latent variables. Finally, we observe that batch normalization and deterministic warm-up (gradually turning on the KL-term) are crucial for training variational models with many stochastic layers.

Via

Access Paper or Ask Questions

Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models

May 23, 2016

Aki Vehtari, Tommi Mononen, Ville Tolvanen, Tuomas Sivula, Ole Winther

Figure 1 for Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models

Figure 2 for Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models

Figure 3 for Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models

Figure 4 for Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models

Abstract:The future predictive performance of a Bayesian model can be estimated using Bayesian cross-validation. In this article, we consider Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation (EP). We study the properties of several Bayesian leave-one-out (LOO) cross-validation approximations that in most cases can be computed with a small additional cost after forming the posterior approximation given the full data. Our main objective is to assess the accuracy of the approximative LOO cross-validation estimators. That is, for each method (Laplace and EP) we compare the approximate fast computation with the exact brute force LOO computation. Secondarily, we evaluate the accuracy of the Laplace and EP approximations themselves against a ground truth established through extensive Markov chain Monte Carlo simulation. Our empirical results show that the approach based upon a Gaussian approximation to the LOO marginal distribution (the so-called cavity distribution) gives the most accurate and reliable results among the fast methods.

* Journal of Machine Learning Research, 17(103):1-38, 2016

Via

Access Paper or Ask Questions

Autoencoding beyond pixels using a learned similarity metric

Feb 10, 2016

Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther

Figure 1 for Autoencoding beyond pixels using a learned similarity metric

Figure 2 for Autoencoding beyond pixels using a learned similarity metric

Figure 3 for Autoencoding beyond pixels using a learned similarity metric

Figure 4 for Autoencoding beyond pixels using a learned similarity metric

Abstract:We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder with a generative adversarial network we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e.g. translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual fidelity. Moreover, we show that the method learns an embedding in which high-level abstract visual features (e.g. wearing glasses) can be modified using simple arithmetic.

Via

Access Paper or Ask Questions

Recurrent Spatial Transformer Networks

Sep 17, 2015

Søren Kaae Sønderby, Casper Kaae Sønderby, Lars Maaløe, Ole Winther

Figure 1 for Recurrent Spatial Transformer Networks

Figure 2 for Recurrent Spatial Transformer Networks

Figure 3 for Recurrent Spatial Transformer Networks

Figure 4 for Recurrent Spatial Transformer Networks

Abstract:We integrate the recently proposed spatial transformer network (SPN) [Jaderberg et. al 2015] into a recurrent neural network (RNN) to form an RNN-SPN model. We use the RNN-SPN to classify digits in cluttered MNIST sequences. The proposed model achieves a single digit error of 1.5% compared to 2.9% for a convolutional networks and 2.0% for convolutional networks with SPN layers. The SPN outputs a zoomed, rotated and skewed version of the input image. We investigate different down-sampling factors (ratio of pixel in input and output) for the SPN and show that the RNN-SPN model is able to down-sample the input images without deteriorating performance. The down-sampling in RNN-SPN can be thought of as adaptive down-sampling that minimizes the information loss in the regions of interest. We attribute the superior performance of the RNN-SPN to the fact that it can attend to a sequence of regions of interest.

Via

Access Paper or Ask Questions

Spatio-temporal Spike and Slab Priors for Multiple Measurement Vector Problems

Aug 19, 2015

Michael Riis Andersen, Ole Winther, Lars Kai Hansen

Figure 1 for Spatio-temporal Spike and Slab Priors for Multiple Measurement Vector Problems

Figure 2 for Spatio-temporal Spike and Slab Priors for Multiple Measurement Vector Problems

Figure 3 for Spatio-temporal Spike and Slab Priors for Multiple Measurement Vector Problems

Figure 4 for Spatio-temporal Spike and Slab Priors for Multiple Measurement Vector Problems

Abstract:We are interested in solving the multiple measurement vector (MMV) problem for instances, where the underlying sparsity pattern exhibit spatio-temporal structure motivated by the electroencephalogram (EEG) source localization problem. We propose a probabilistic model that takes this structure into account by generalizing the structured spike and slab prior and the associated Expectation Propagation inference scheme. Based on numerical experiments, we demonstrate the viability of the model and the approximate inference scheme.

* 6 pages, 6 figures, accepted for presentation at SPARS 2015

Via

Access Paper or Ask Questions