Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aaron Courville

Riemannian Diffusion Models

Aug 16, 2022
Chin-Wei Huang, Milad Aghajohari, Avishek Joey Bose, Prakash Panangaden, Aaron Courville

Figure 1 for Riemannian Diffusion Models

Figure 2 for Riemannian Diffusion Models

Figure 3 for Riemannian Diffusion Models

Figure 4 for Riemannian Diffusion Models

Diffusion models are recent state-of-the-art methods for image generation and likelihood estimation. In this work, we generalize continuous-time diffusion models to arbitrary Riemannian manifolds and derive a variational framework for likelihood estimation. Computationally, we propose new methods for computing the Riemannian divergence which is needed in the likelihood estimation. Moreover, in generalizing the Euclidean case, we prove that maximizing this variational lower-bound is equivalent to Riemannian score matching. Empirically, we demonstrate the expressive power of Riemannian diffusion models on a wide spectrum of smooth manifolds, such as spheres, tori, hyperboloids, and orthogonal groups. Our proposed method achieves new state-of-the-art likelihoods on all benchmarks.

Via

Access Paper or Ask Questions

R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

Jun 30, 2022
Kyle Kastner, Aaron Courville

Figure 1 for R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

Figure 2 for R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

Figure 3 for R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

Figure 4 for R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

This paper introduces R-MelNet, a two-part autoregressive architecture with a frontend based on the first tier of MelNet and a backend WaveRNN-style audio decoder for neural text-to-speech synthesis. Taking as input a mixed sequence of characters and phonemes, with an optional audio priming sequence, this model produces low-resolution mel-spectral features which are interpolated and used by a WaveRNN decoder to produce an audio waveform. Coupled with half precision training, R-MelNet uses under 11 gigabytes of GPU memory on a single commodity GPU (NVIDIA 2080Ti). We detail a number of critical implementation details for stable half precision training, including an approximate, numerically stable mixture of logistics attention. Using a stochastic, multi-sample per step inference scheme, the resulting model generates highly varied audio, while enabling text and audio based controls to modify output waveforms. Qualitative and quantitative evaluations of an R-MelNet system trained on a single speaker TTS dataset demonstrate the effectiveness of our approach.

Via

Access Paper or Ask Questions

Building Robust Ensembles via Margin Boosting

Jun 07, 2022
Dinghuai Zhang, Hongyang Zhang, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar, Arun Sai Suggala

Figure 1 for Building Robust Ensembles via Margin Boosting

Figure 2 for Building Robust Ensembles via Margin Boosting

Figure 3 for Building Robust Ensembles via Margin Boosting

Figure 4 for Building Robust Ensembles via Margin Boosting

In the context of adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks, and as a result, has sub-optimal robustness. Consequently, an emerging line of work has focused on learning an ensemble of neural networks to defend against adversarial attacks. In this work, we take a principled approach towards building robust ensembles. We view this problem from the perspective of margin-boosting and develop an algorithm for learning an ensemble with maximum margin. Through extensive empirical evaluation on benchmark datasets, we show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion. An important byproduct of our work is a margin-maximizing cross-entropy (MCE) loss, which is a better alternative to the standard cross-entropy (CE) loss. Empirically, we show that replacing the CE loss in state-of-the-art adversarial training techniques with our MCE loss leads to significant performance improvement.

* Accepted by ICML 2022

Via

Access Paper or Ask Questions

Beyond Tabula Rasa: Reincarnating Reinforcement Learning

Jun 03, 2022
Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

Figure 1 for Beyond Tabula Rasa: Reincarnating Reinforcement Learning

Figure 2 for Beyond Tabula Rasa: Reincarnating Reinforcement Learning

Figure 3 for Beyond Tabula Rasa: Reincarnating Reinforcement Learning

Figure 4 for Beyond Tabula Rasa: Reincarnating Reinforcement Learning

Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from scratch, which would have been prohibitively expensive. Additionally, the inefficiency of deep RL typically excludes researchers without access to industrial-scale resources from tackling computationally-demanding problems. To address these issues, we present reincarnating RL as an alternative workflow, where prior computational work (e.g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another. As a step towards enabling reincarnating RL from any agent to any other agent, we focus on the specific setting of efficiently transferring an existing sub-optimal policy to a standalone value-based RL agent. We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations. Equipped with this algorithm, we demonstrate reincarnating RL's gains over tabula rasa RL on Atari 2600 games, a challenging locomotion task, and the real-world problem of navigating stratospheric balloons. Overall, this work argues for an alternative approach to RL research, which we believe could significantly improve real-world RL adoption and help democratize it further.

Via

Access Paper or Ask Questions

Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Jun 02, 2022
Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

Figure 1 for Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Figure 2 for Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Figure 3 for Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Figure 4 for Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

We propose a unifying view to analyze the representation quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training. We argue that representations can be evaluated through the lens of expressiveness and learnability. We propose to use the Intrinsic Dimension (ID) to assess expressiveness and introduce Cluster Learnability (CL) to assess learnability. CL is measured as the learning speed of a KNN classifier trained to predict labels obtained by clustering the representations with K-means. We thus combine CL and ID into a single predictor: CLID. Through a large-scale empirical study with a diverse family of SSL algorithms, we find that CLID better correlates with in-distribution model performance than other competing recent evaluation schemes. We also benchmark CLID on out-of-domain generalization, where CLID serves as a predictor of the transfer performance of SSL models on several classification tasks, yielding improvements with respect to the competing baselines.

Via

Access Paper or Ask Questions

Cascaded Video Generation for Videos In-the-Wild

Jun 01, 2022
Lluis Castrejon, Nicolas Ballas, Aaron Courville

Figure 1 for Cascaded Video Generation for Videos In-the-Wild

Figure 2 for Cascaded Video Generation for Videos In-the-Wild

Figure 3 for Cascaded Video Generation for Videos In-the-Wild

Figure 4 for Cascaded Video Generation for Videos In-the-Wild

Videos can be created by first outlining a global view of the scene and then adding local details. Inspired by this idea we propose a cascaded model for video generation which follows a coarse to fine approach. First our model generates a low resolution video, establishing the global scene structure, which is then refined by subsequent cascade levels operating at larger resolutions. We train each cascade level sequentially on partial views of the videos, which reduces the computational complexity of our model and makes it scalable to high-resolution videos with many frames. We empirically validate our approach on UCF101 and Kinetics-600, for which our model is competitive with the state-of-the-art. We further demonstrate the scaling capabilities of our model and train a three-level model on the BDD100K dataset which generates 256x256 pixels videos with 48 frames.

* Accepted to the 26th International Conference on Pattern Recognition (ICPR 2022). arXiv admin note: substantial text overlap with arXiv:2106.02719

Via

Access Paper or Ask Questions

The Primacy Bias in Deep Reinforcement Learning

May 16, 2022
Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

Figure 1 for The Primacy Bias in Deep Reinforcement Learning

Figure 2 for The Primacy Bias in Deep Reinforcement Learning

Figure 3 for The Primacy Bias in Deep Reinforcement Learning

Figure 4 for The Primacy Bias in Deep Reinforcement Learning

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.

* ICML 2022; code at https://github.com/evgenii-nikishin/rl_with_resets

Via

Access Paper or Ask Questions

Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

Apr 01, 2022
Samuel Lavoie, Christos Tsirigotis, Max Schwarzer, Kenji Kawaguchi, Ankit Vani, Aaron Courville

Figure 1 for Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

Figure 2 for Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

Figure 3 for Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

Figure 4 for Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

We introduce Simplicial Embeddings (SEMs) as a way to constrain the encoded representations of a self-supervised model to $L$ simplices of $V$ dimensions each using a Softmax operation. This procedure imposes a structure on the representations that reduce their expressivity for training downstream classifiers, which helps them generalize better. Specifically, we show that the temperature $\tau$ of the Softmax operation controls for the SEM representation's expressivity, allowing us to derive a tighter downstream classifier generalization bound than that for classifiers using unnormalized representations. We empirically demonstrate that SEMs considerably improve generalization on natural image datasets such as CIFAR-100 and ImageNet. Finally, we also present evidence of the emergence of semantically relevant features in SEMs, a pattern that is absent from baseline self-supervised models.

* 22 pages, 5 figures, 5 tables, Preprint

Via

Access Paper or Ask Questions

Generative Flow Networks for Discrete Probabilistic Modeling

Feb 03, 2022
Dinghuai Zhang, Nikolay Malkin, Zhen Liu, Alexandra Volokhova, Aaron Courville, Yoshua Bengio

Figure 1 for Generative Flow Networks for Discrete Probabilistic Modeling

Figure 2 for Generative Flow Networks for Discrete Probabilistic Modeling

Figure 3 for Generative Flow Networks for Discrete Probabilistic Modeling

Figure 4 for Generative Flow Networks for Discrete Probabilistic Modeling

We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data. Building upon the theory of generative flow networks (GFlowNets), we model the generation process by a stochastic data construction policy and thus amortize expensive MCMC exploration into a fixed number of actions sampled from a GFlowNet. We show how GFlowNets can approximately perform large-block Gibbs sampling to mix between modes. We propose a framework to jointly train a GFlowNet with an energy function, so that the GFlowNet learns to sample from the energy distribution, while the energy learns with an approximate MLE objective with negative samples from the GFlowNet. We demonstrate EB-GFN's effectiveness on various probabilistic modeling tasks.

* 17 pages; code: https://github.com/zdhNarsil/EB_GFN

Via

Access Paper or Ask Questions

Fortuitous Forgetting in Connectionist Networks

Feb 01, 2022
Hattie Zhou, Ankit Vani, Hugo Larochelle, Aaron Courville

Figure 1 for Fortuitous Forgetting in Connectionist Networks

Figure 2 for Fortuitous Forgetting in Connectionist Networks

Figure 3 for Fortuitous Forgetting in Connectionist Networks

Figure 4 for Fortuitous Forgetting in Connectionist Networks

Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact be favorable to learning. We introduce "forget-and-relearn" as a powerful paradigm for shaping the learning trajectories of artificial neural networks. In this process, the forgetting step selectively removes undesirable information from the model, and the relearning step reinforces features that are consistently useful under different conditions. The forget-and-relearn framework unifies many existing iterative training algorithms in the image classification and language emergence literature, and allows us to understand the success of these algorithms in terms of the disproportionate forgetting of undesirable information. We leverage this understanding to improve upon existing algorithms by designing more targeted forgetting operations. Insights from our analysis provide a coherent view on the dynamics of iterative training in neural networks and offer a clear path towards performance improvements.

* ICLR 2022
* ICLR Camera Ready

Via

Access Paper or Ask Questions