Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

R Devon Hjelm

Learning deep representations by mutual information estimation and maximization

Oct 03, 2018

R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio

Figure 1 for Learning deep representations by mutual information estimation and maximization

Figure 2 for Learning deep representations by mutual information estimation and maximization

Figure 3 for Learning deep representations by mutual information estimation and maximization

Figure 4 for Learning deep representations by mutual information estimation and maximization

Abstract:In this work, we perform unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder. Importantly, we show that structure matters: incorporating knowledge about locality of the input to the objective can greatly influence a representation's suitability for downstream tasks. We further control characteristics of the representation by matching to a prior distribution adversarially. Our method, which we call Deep InfoMax (DIM), outperforms a number of popular unsupervised learning methods and competes with fully-supervised learning on several classification tasks. DIM opens new avenues for unsupervised learning of representations and is an important step towards flexible formulations of representation-learning objectives for specific end-goals.

Via

Access Paper or Ask Questions

Deep Graph Infomax

Sep 27, 2018

Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, R Devon Hjelm

Abstract:We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In contrast to most prior approaches to graph representation learning, DGI does not rely on random walks, and is readily applicable to both transductive and inductive learning setups. We demonstrate competitive performance on a variety of node classification benchmarks, which at times even exceeds the performance of supervised learning.

* Under review as a conference paper at ICLR 2019. 15 pages, 8 figures

Via

Access Paper or Ask Questions

On-line Adaptative Curriculum Learning for GANs

Sep 12, 2018

Thang Doan, Joao Monteiro, Isabela Albuquerque, Bogdan Mazoure, Audrey Durand, Joelle Pineau, R Devon Hjelm

Figure 1 for On-line Adaptative Curriculum Learning for GANs

Figure 2 for On-line Adaptative Curriculum Learning for GANs

Figure 3 for On-line Adaptative Curriculum Learning for GANs

Figure 4 for On-line Adaptative Curriculum Learning for GANs

Abstract:Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen as a one-student/multiple-teachers setting. We formalize this problem within the full-information adversarial bandit framework, where we evaluate the capability of an algorithm to select mixtures of discriminators for providing the generator with feedback during learning. To this end, we propose a reward function which reflects the progress made by the generator and dynamically update the mixture weights allocated to each discriminator. We also draw connections between our algorithm and stochastic optimization methods and then show that existing approaches using multiple discriminators in literature can be recovered from our framework. We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support. On the other hand, highly expressive discriminators ensure samples quality. Finally, experimental results show that our approach improves samples quality and diversity over existing baselines by effectively learning a curriculum. These results also support the claim that weaker discriminators have higher entropy improving modes coverage.

Via

Access Paper or Ask Questions

Spatio-temporal Dynamics of Intrinsic Networks in Functional Magnetic Imaging Data Using Recurrent Neural Networks

Aug 27, 2018

R Devon Hjelm, Eswar Damaraju, Kyunghyun Cho, Helmut Laufs, Sergey M. Plis, Vince Calhoun

Figure 1 for Spatio-temporal Dynamics of Intrinsic Networks in Functional Magnetic Imaging Data Using Recurrent Neural Networks

Figure 2 for Spatio-temporal Dynamics of Intrinsic Networks in Functional Magnetic Imaging Data Using Recurrent Neural Networks

Abstract:We introduce a novel recurrent neural network (RNN) approach to account for temporal dynamics and dependencies in brain networks observed via functional magnetic resonance imaging (fMRI). Our approach directly parameterizes temporal dynamics through recurrent connections, which can be used to formulate blind source separation with a conditional (rather than marginal) independence assumption, which we call RNN-ICA. This formulation enables us to visualize the temporal dynamics of both first order (activity) and second order (directed connectivity) information in brain networks that are widely studied in a static sense, but not well-characterized dynamically. RNN-ICA predicts dynamics directly from the recurrent states of the RNN in both task and resting state fMRI. Our results show both task-related and group-differentiating directed connectivity.

* Accepted to Frontiers of Neuroscience

Via

Access Paper or Ask Questions

Variance Regularizing Adversarial Learning

Aug 19, 2018

Karan Grewal, R Devon Hjelm, Yoshua Bengio

Figure 1 for Variance Regularizing Adversarial Learning

Figure 2 for Variance Regularizing Adversarial Learning

Figure 3 for Variance Regularizing Adversarial Learning

Figure 4 for Variance Regularizing Adversarial Learning

Abstract:We introduce a novel approach for training adversarial models by replacing the discriminator score with a bi-modal Gaussian distribution over the real/fake indicator variables. In order to do this, we train the Gaussian classifier to match the target bi-modal distribution implicitly through meta-adversarial training. We hypothesize that this approach ensures a non-zero gradient to the generator, even in the limit of a perfect classifier. We test our method against standard benchmark image datasets as well as show the classifier output distribution is smooth and has overlap between the real and fake modes.

* Method is out of date and some results are incorrect

Via

Access Paper or Ask Questions

MINE: Mutual Information Neural Estimation

Jun 07, 2018

Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm

Figure 1 for MINE: Mutual Information Neural Estimation

Figure 2 for MINE: Mutual Information Neural Estimation

Figure 3 for MINE: Mutual Information Neural Estimation

Figure 4 for MINE: Mutual Information Neural Estimation

Abstract:We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.

* ICML 2018
* 19 pages, 6 figures

Via

Access Paper or Ask Questions

Boundary-Seeking Generative Adversarial Networks

Feb 21, 2018

R Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio

Figure 1 for Boundary-Seeking Generative Adversarial Networks

Figure 2 for Boundary-Seeking Generative Adversarial Networks

Figure 3 for Boundary-Seeking Generative Adversarial Networks

Figure 4 for Boundary-Seeking Generative Adversarial Networks

Abstract:Generative adversarial networks (GANs) are a learning framework that rely on training a discriminator to estimate a measure of difference between a target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator. The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs). We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation. In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning.

Via

Access Paper or Ask Questions

Iterative Refinement of the Approximate Posterior for Directed Belief Networks

Feb 20, 2018

R Devon Hjelm, Kyunghyun Cho, Junyoung Chung, Russ Salakhutdinov, Vince Calhoun, Nebojsa Jojic

Figure 1 for Iterative Refinement of the Approximate Posterior for Directed Belief Networks

Figure 2 for Iterative Refinement of the Approximate Posterior for Directed Belief Networks

Figure 3 for Iterative Refinement of the Approximate Posterior for Directed Belief Networks

Figure 4 for Iterative Refinement of the Approximate Posterior for Directed Belief Networks

Abstract:Variational methods that rely on a recognition network to approximate the posterior of directed graphical models offer better inference and learning than previous methods. Recent advances that exploit the capacity and flexibility in this approach have expanded what kinds of models can be trained. However, as a proposal for the posterior, the capacity of the recognition network is limited, which can constrain the representational power of the generative model and increase the variance of Monte Carlo estimates. To address these issues, we introduce an iterative refinement procedure for improving the approximate posterior of the recognition network and show that training with the refined posterior is competitive with state-of-the-art methods. The advantages of refinement are further evident in an increased effective sample size, which implies a lower variance of gradient estimates.

Via

Access Paper or Ask Questions

ACtuAL: Actor-Critic Under Adversarial Learning

Nov 13, 2017

Anirudh Goyal, Nan Rosemary Ke, Alex Lamb, R Devon Hjelm, Chris Pal, Joelle Pineau, Yoshua Bengio

Figure 1 for ACtuAL: Actor-Critic Under Adversarial Learning

Figure 2 for ACtuAL: Actor-Critic Under Adversarial Learning

Figure 3 for ACtuAL: Actor-Critic Under Adversarial Learning

Figure 4 for ACtuAL: Actor-Critic Under Adversarial Learning

Abstract:Generative Adversarial Networks (GANs) are a powerful framework for deep generative modeling. Posed as a two-player minimax problem, GANs are typically trained end-to-end on real-valued data and can be used to train a generator of high-dimensional and realistic images. However, a major limitation of GANs is that training relies on passing gradients from the discriminator through the generator via back-propagation. This makes it fundamentally difficult to train GANs with discrete data, as generation in this case typically involves a non-differentiable function. These difficulties extend to the reinforcement learning setting when the action space is composed of discrete decisions. We address these issues by reframing the GAN framework so that the generator is no longer trained using gradients through the discriminator, but is instead trained using a learned critic in the actor-critic framework with a Temporal Difference (TD) objective. This is a natural fit for sequence modeling and we use it to achieve improvements on language modeling tasks over the standard Teacher-Forcing methods.

Via

Access Paper or Ask Questions

Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

Feb 26, 2017

Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio

Figure 1 for Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

Figure 2 for Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

Figure 3 for Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

Figure 4 for Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

Abstract:Despite the successes in capturing continuous distributions, the application of generative adversarial networks (GANs) to discrete settings, like natural language tasks, is rather restricted. The fundamental reason is the difficulty of back-propagation through discrete random variables combined with the inherent instability of the GAN training objective. To address these problems, we propose Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. Instead of directly optimizing the GAN objective, we derive a novel and low-variance objective using the discriminator's output that follows corresponds to the log-likelihood. Compared with the original, the new objective is proved to be consistent in theory and beneficial in practice. The experimental results on various discrete datasets demonstrate the effectiveness of the proposed approach.

* 11 pages, 3 figures

Via

Access Paper or Ask Questions