Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aaron Courville

Universite de Montreal

GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Dec 12, 2017

Alex Lamb, Devon Hjelm, Yaroslav Ganin, Joseph Paul Cohen, Aaron Courville, Yoshua Bengio

Figure 1 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Figure 2 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Figure 3 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Figure 4 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Abstract:Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet sampling from them generally requires an iterative procedure such as blocked Gibbs-sampling that may require many steps to draw samples from the joint distribution $p(x, z)$. We propose a novel approach to learning the joint distribution between the data and a latent code which uses an adversarially learned iterative procedure to gradually refine the joint distribution, $p(x, z)$, to better match with the data distribution on each step. GibbsNet is the best of both worlds both in theory and in practice. Achieving the speed and simplicity of a directed latent variable model, it is guaranteed (assuming the adversarial game reaches the virtual training criteria global minimum) to produce samples from $p(x, z)$ with only a few sampling iterations. Achieving the expressiveness and flexibility of an undirected latent variable model, GibbsNet does away with the need for an explicit $p(z)$ and has the ability to do attribute prediction, class-conditional generation, and joint image-attribute modeling in a single model which is not trained for any of these specific tasks. We show empirically that GibbsNet is able to learn a more complex $p(z)$ and show that this leads to improved inpainting and iterative refinement of $p(x, z)$ for dozens of steps and stable generation without collapse for thousands of steps, despite being trained on only a few steps.

* NIPS 2017

Via

Access Paper or Ask Questions

HoME: a Household Multimodal Environment

Nov 29, 2017

Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville

Figure 1 for HoME: a Household Multimodal Environment

Figure 2 for HoME: a Household Multimodal Environment

Figure 3 for HoME: a Household Multimodal Environment

Figure 4 for HoME: a Household Multimodal Environment

Abstract:We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.

* Presented at NIPS 2017's Visually-Grounded Interaction and Language Workshop

Via

Access Paper or Ask Questions

Learnable Explicit Density for Continuous Latent Space and Variational Inference

Oct 06, 2017

Chin-Wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville

Figure 1 for Learnable Explicit Density for Continuous Latent Space and Variational Inference

Figure 2 for Learnable Explicit Density for Continuous Latent Space and Variational Inference

Figure 3 for Learnable Explicit Density for Continuous Latent Space and Variational Inference

Figure 4 for Learnable Explicit Density for Continuous Latent Space and Variational Inference

Abstract:In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF) and show that with further improvement, inverse AF could be used as universal approximation to any complicated posterior. Our analysis results in a unified approach to parameterizing a VAE, without the need to restrict ourselves to use factorial Gaussians in the latent real space.

* 2 figures, 5 pages, submitted to ICML Principled Approaches to Deep Learning workshop

Via

Access Paper or Ask Questions

Piecewise Latent Variables for Neural Variational Text Processing

Sep 23, 2017

Iulian V. Serban, Alexander G. Ororbia II, Joelle Pineau, Aaron Courville

Figure 1 for Piecewise Latent Variables for Neural Variational Text Processing

Figure 2 for Piecewise Latent Variables for Neural Variational Text Processing

Figure 3 for Piecewise Latent Variables for Neural Variational Text Processing

Figure 4 for Piecewise Latent Variables for Neural Variational Text Processing

Abstract:Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.

* 19 pages, 2 figures, 8 tables; EMNLP 2017

Via

Access Paper or Ask Questions

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

Sep 22, 2017

David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal

Figure 1 for Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

Figure 2 for Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

Figure 3 for Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

Figure 4 for Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

Abstract:We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks. We perform an empirical investigation of various RNN regularizers, and find that zoneout gives significant performance improvements across tasks. We achieve competitive results with relatively simple models in character- and word-level language modelling on the Penn Treebank and Text8 datasets, and combining with recurrent batch normalization yields state-of-the-art results on permuted sequential MNIST.

* David Krueger and Tegan Maharaj contributed equally to this work

Via

Access Paper or Ask Questions

Self-organized Hierarchical Softmax

Jul 26, 2017

Yikang Shen, Shawn Tan, Chrisopher Pal, Aaron Courville

Figure 1 for Self-organized Hierarchical Softmax

Figure 2 for Self-organized Hierarchical Softmax

Figure 3 for Self-organized Hierarchical Softmax

Figure 4 for Self-organized Hierarchical Softmax

Abstract:We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies. Instead of using a predefined hierarchical structure, our approach is capable of learning word clusters with clear syntactical and semantic meaning during the language model training process. We provide experiments on standard benchmarks for language modeling and sentence compression tasks. We find that this approach is as fast as other efficient softmax approximations, while achieving comparable or even better performance relative to similar full softmax models.

Via

Access Paper or Ask Questions

A Closer Look at Memorization in Deep Networks

Jul 01, 2017

Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio(+1 more)

Figure 1 for A Closer Look at Memorization in Deep Networks

Figure 2 for A Closer Look at Memorization in Deep Networks

Figure 3 for A Closer Look at Memorization in Deep Networks

Figure 4 for A Closer Look at Memorization in Deep Networks

Abstract:We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

* Appears in Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Devansh Arpit, Stanis{\l}aw Jastrz\k{e}bski, Nicolas Ballas, and David Krueger contributed equally to this work

Via

Access Paper or Ask Questions

Adversarial Generation of Natural Language

May 31, 2017

Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville

Figure 1 for Adversarial Generation of Natural Language

Figure 2 for Adversarial Generation of Natural Language

Figure 3 for Adversarial Generation of Natural Language

Figure 4 for Adversarial Generation of Natural Language

Abstract:Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.

* 11 pages, 3 figures, 5 tables

Via

Access Paper or Ask Questions

End-to-end optimization of goal-driven and visually grounded dialogue systems

Mar 15, 2017

Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin

Figure 1 for End-to-end optimization of goal-driven and visually grounded dialogue systems

Figure 2 for End-to-end optimization of goal-driven and visually grounded dialogue systems

Figure 3 for End-to-end optimization of goal-driven and visually grounded dialogue systems

Figure 4 for End-to-end optimization of goal-driven and visually grounded dialogue systems

Abstract:End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature, making the context of a dialogue larger than the sole history. This is why only chit-chat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues, based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues collected through Mechanical Turk and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.

Via

Access Paper or Ask Questions

An Actor-Critic Algorithm for Sequence Prediction

Mar 03, 2017

Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

Figure 1 for An Actor-Critic Algorithm for Sequence Prediction

Figure 2 for An Actor-Critic Algorithm for Sequence Prediction

Abstract:We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \textit{critic} network that is trained to predict the value of an output token, given the policy of an \textit{actor} network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.

Via

Access Paper or Ask Questions