Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aaron Courville

Neural Autoregressive Flows

Apr 03, 2018
Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville

Figure 1 for Neural Autoregressive Flows

Figure 2 for Neural Autoregressive Flows

Figure 3 for Neural Autoregressive Flows

Figure 4 for Neural Autoregressive Flows

Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.

* 16 pages, 10 figures, 3 tables

Via

Access Paper or Ask Questions

Generating Contradictory, Neutral, and Entailing Sentences

Mar 07, 2018
Yikang Shen, Shawn Tan, Chin-Wei Huang, Aaron Courville

Figure 1 for Generating Contradictory, Neutral, and Entailing Sentences

Figure 2 for Generating Contradictory, Neutral, and Entailing Sentences

Figure 3 for Generating Contradictory, Neutral, and Entailing Sentences

Figure 4 for Generating Contradictory, Neutral, and Entailing Sentences

Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to learn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on an input sentence and a logical inference label. We do this by modeling the different possibilities for the output sentence as a distribution over the latent representation, which we train using an adversarial objective. We evaluate the model using two state-of-the-art models for the Recognizing Textual Entailment (RTE) task, and measure the BLEU scores against the actual sentences as a probe for the diversity of sentences produced by our model. The experiment results show that, given our framework, we have clear ways to improve the quality and diversity of generated sentences.

Via

Access Paper or Ask Questions

Neural Language Modeling by Jointly Learning Syntax and Lexicon

Feb 19, 2018
Yikang Shen, Zhouhan Lin, Chin-Wei Huang, Aaron Courville

Figure 1 for Neural Language Modeling by Jointly Learning Syntax and Lexicon

Figure 2 for Neural Language Modeling by Jointly Learning Syntax and Lexicon

Figure 3 for Neural Language Modeling by Jointly Learning Syntax and Lexicon

Figure 4 for Neural Language Modeling by Jointly Learning Syntax and Lexicon

We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.

* 16 pages, 5 figures, ICLR 2018

Via

Access Paper or Ask Questions

Hierarchical Adversarially Learned Inference

Feb 04, 2018
Mohamed Ishmael Belghazi, Sai Rajeswar, Olivier Mastropietro, Negar Rostamzadeh, Jovana Mitrovic, Aaron Courville

Figure 1 for Hierarchical Adversarially Learned Inference

Figure 2 for Hierarchical Adversarially Learned Inference

Figure 3 for Hierarchical Adversarially Learned Inference

Figure 4 for Hierarchical Adversarially Learned Inference

We propose a novel hierarchical generative model with a simple Markovian structure and a corresponding inference model. Both the generative and inference model are trained using the adversarial learning paradigm. We demonstrate that the hierarchical structure supports the learning of progressively more abstract representations as well as providing semantically meaningful reconstructions with different levels of fidelity. Furthermore, we show that minimizing the Jensen-Shanon divergence between the generative and inference network is enough to minimize the reconstruction error. The resulting semantically meaningful hierarchical latent structure discovery is exemplified on the CelebA dataset. There, we show that the features learned by our model in an unsupervised way outperform the best handcrafted features. Furthermore, the extracted features remain competitive when compared to several recent deep supervised approaches on an attribute prediction task on CelebA. Finally, we leverage the model's inference network to achieve state-of-the-art performance on a semi-supervised variant of the MNIST digit classification task.

* 18 pages, 7 figures

Via

Access Paper or Ask Questions

Efficient EM Training of Gaussian Mixtures with Missing Data

Jan 08, 2018
Olivier Delalleau, Aaron Courville, Yoshua Bengio

Figure 1 for Efficient EM Training of Gaussian Mixtures with Missing Data

Figure 2 for Efficient EM Training of Gaussian Mixtures with Missing Data

Figure 3 for Efficient EM Training of Gaussian Mixtures with Missing Data

In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms. A solution that we explore in this paper is the use of a generative model (a mixture of Gaussians) to compute the conditional expectation of the missing variables given the observed variables. Since training a Gaussian mixture with many different patterns of missing values can be computationally very expensive, we introduce a spanning-tree based algorithm that significantly speeds up training in these conditions. We also observe that good results can be obtained by using the generative model to fill-in the missing values for a separate discriminant learning algorithm.

Via

Access Paper or Ask Questions

Improved Training of Wasserstein GANs

Dec 25, 2017
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville

Figure 1 for Improved Training of Wasserstein GANs

Figure 2 for Improved Training of Wasserstein GANs

Figure 3 for Improved Training of Wasserstein GANs

Figure 4 for Improved Training of Wasserstein GANs

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

* NIPS camera-ready

Via

Access Paper or Ask Questions

Learning Visual Reasoning Without Strong Priors

Dec 18, 2017
Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron Courville

Figure 1 for Learning Visual Reasoning Without Strong Priors

Figure 2 for Learning Visual Reasoning Without Strong Priors

Figure 3 for Learning Visual Reasoning Without Strong Priors

Figure 4 for Learning Visual Reasoning Without Strong Priors

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.

* Full AAAI 2018 paper is at arXiv:1709.07871. Presented at ICML 2017's Machine Learning in Speech and Language Processing Workshop. Code is at http://github.com/ethanjperez/film

Via

Access Paper or Ask Questions

FiLM: Visual Reasoning with a General Conditioning Layer

Dec 18, 2017
Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

Figure 1 for FiLM: Visual Reasoning with a General Conditioning Layer

Figure 2 for FiLM: Visual Reasoning with a General Conditioning Layer

Figure 3 for FiLM: Visual Reasoning with a General Conditioning Layer

Figure 4 for FiLM: Visual Reasoning with a General Conditioning Layer

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

* AAAI 2018. Code available at http://github.com/ethanjperez/film . Extends arXiv:1707.03017

Via

Access Paper or Ask Questions

Modulating early visual processing by language

Dec 18, 2017
Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville

Figure 1 for Modulating early visual processing by language

Figure 2 for Modulating early visual processing by language

Figure 3 for Modulating early visual processing by language

Figure 4 for Modulating early visual processing by language

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.

* Advances in Neural Information Processing Systems 30 (NIPS 2017)

Via

Access Paper or Ask Questions

GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Dec 12, 2017
Alex Lamb, Devon Hjelm, Yaroslav Ganin, Joseph Paul Cohen, Aaron Courville, Yoshua Bengio

Figure 1 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Figure 2 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Figure 3 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Figure 4 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet sampling from them generally requires an iterative procedure such as blocked Gibbs-sampling that may require many steps to draw samples from the joint distribution $p(x, z)$. We propose a novel approach to learning the joint distribution between the data and a latent code which uses an adversarially learned iterative procedure to gradually refine the joint distribution, $p(x, z)$, to better match with the data distribution on each step. GibbsNet is the best of both worlds both in theory and in practice. Achieving the speed and simplicity of a directed latent variable model, it is guaranteed (assuming the adversarial game reaches the virtual training criteria global minimum) to produce samples from $p(x, z)$ with only a few sampling iterations. Achieving the expressiveness and flexibility of an undirected latent variable model, GibbsNet does away with the need for an explicit $p(z)$ and has the ability to do attribute prediction, class-conditional generation, and joint image-attribute modeling in a single model which is not trained for any of these specific tasks. We show empirically that GibbsNet is able to learn a more complex $p(z)$ and show that this leads to improved inpainting and iterative refinement of $p(x, z)$ for dozens of steps and stable generation without collapse for thousands of steps, despite being trained on only a few steps.

* NIPS 2017

Via

Access Paper or Ask Questions