Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aaron Courville

Universite de Montreal

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Dec 17, 2021

Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel

Figure 1 for MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Figure 2 for MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Figure 3 for MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Figure 4 for MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Abstract:Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control. Starting from interpretable Differentiable Digital Signal Processing (DDSP) synthesis parameters, we infer musical notes and high-level properties of their expressive performance (such as timbre, vibrato, dynamics, and articulation). This creates a 3-level hierarchy (notes, performance, synthesis) that affords individuals the option to intervene at each level, or utilize trained priors (performance given notes, synthesis given performance) for creative assistance. Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence. By utilizing an interpretable hierarchy, with multiple levels of granularity, MIDI-DDSP opens the door to assistive tools to empower individuals across a diverse range of musical experience.

Via

Access Paper or Ask Questions

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Dec 09, 2021

Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

Figure 1 for DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Figure 2 for DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Figure 3 for DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Figure 4 for DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Abstract:Despite overparameterization, deep networks trained via supervised learning are easy to optimize and exhibit excellent generalization. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit regularization induced by stochastic gradient descent, which favors parsimonious solutions that generalize well on test inputs. It is reasonable to surmise that deep reinforcement learning (RL) methods could also benefit from this effect. In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations. Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions with excessive "aliasing", in stark contrast to the supervised learning case. We back up these findings empirically, showing that feature representations learned by a deep network value function trained via bootstrapping can indeed become degenerate, aliasing the representations for state-action pairs that appear on either side of the Bellman backup. To address this issue, we derive the form of this implicit regularizer and, inspired by this derivation, propose a simple and effective explicit regularizer, called DR3, that counteracts the undesirable effects of this implicit regularizer. When combined with existing offline RL methods, DR3 substantially improves performance and stability, alleviating unlearning in Atari 2600 games, D4RL domains and robotic manipulation from images.

Via

Access Paper or Ask Questions

Multi-label Iterated Learning for Image Classification with Label Ambiguity

Nov 23, 2021

Sai Rajeswar, Pau Rodriguez, Soumye Singhal, David Vazquez, Aaron Courville

Figure 1 for Multi-label Iterated Learning for Image Classification with Label Ambiguity

Figure 2 for Multi-label Iterated Learning for Image Classification with Label Ambiguity

Figure 3 for Multi-label Iterated Learning for Image Classification with Label Ambiguity

Figure 4 for Multi-label Iterated Learning for Image Classification with Label Ambiguity

Abstract:Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that datasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambiguity than the standard training procedure, even when fine-tuning from self-supervised weights. We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision. Furthermore, MILe improves performance in class incremental settings such as IIRC and it is robust to distribution shifts. Code: https://github.com/rajeswar18/MILe

Via

Access Paper or Ask Questions

Chunked Autoregressive GAN for Conditional Waveform Synthesis

Oct 19, 2021

Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio

Figure 1 for Chunked Autoregressive GAN for Conditional Waveform Synthesis

Figure 2 for Chunked Autoregressive GAN for Conditional Waveform Synthesis

Figure 3 for Chunked Autoregressive GAN for Conditional Waveform Synthesis

Figure 4 for Chunked Autoregressive GAN for Conditional Waveform Synthesis

Abstract:Conditional waveform synthesis models learn a distribution of audio waveforms given conditioning such as text, mel-spectrograms, or MIDI. These systems employ deep generative models that model the waveform via either sequential (autoregressive) or parallel (non-autoregressive) sampling. Generative adversarial networks (GANs) have become a common choice for non-autoregressive waveform synthesis. However, state-of-the-art GAN-based models produce artifacts when performing mel-spectrogram inversion. In this paper, we demonstrate that these artifacts correspond with an inability for the generator to learn accurate pitch and periodicity. We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression. We discuss the inductive bias that autoregression provides for learning the relationship between instantaneous frequency and phase, and show that this inductive bias holds even when autoregressively sampling large chunks of the waveform during each forward pass. Relative to prior state-of- the-art GAN-based models, our proposed model, Chunked Autoregressive GAN (CARGAN) reduces pitch error by 40-60%, reduces training time by 58%, maintains a fast generation speed suitable for real-time or interactive applications, and maintains or improves subjective quality.

* Under review as a conference paper at ICLR 2022

Via

Access Paper or Ask Questions

Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond

Oct 06, 2021

Dinghuai Zhang, Jie Fu, Yoshua Bengio, Aaron Courville

Figure 1 for Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond

Figure 2 for Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond

Figure 3 for Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond

Figure 4 for Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond

Abstract:Black-box optimization formulations for biological sequence design have drawn recent attention due to their promising potential impact on the pharmaceutical industry. In this work, we propose to unify two seemingly distinct worlds: likelihood-free inference and black-box sequence design, under one probabilistic framework. In tandem, we provide a recipe for constructing various sequence design methods based on this framework. We show how previous drug discovery approaches can be "reinvented" in our framework, and further propose new probabilistic sequence design algorithms. Extensive experiments illustrate the benefits of the proposed methodology.

Via

Access Paper or Ask Questions

On Bonus-Based Exploration Methods in the Arcade Learning Environment

Sep 22, 2021

Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

Figure 1 for On Bonus-Based Exploration Methods in the Arcade Learning Environment

Figure 2 for On Bonus-Based Exploration Methods in the Arcade Learning Environment

Figure 3 for On Bonus-Based Exploration Methods in the Arcade Learning Environment

Figure 4 for On Bonus-Based Exploration Methods in the Arcade Learning Environment

Abstract:Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on Montezuma's Revenge, Bellemare et al.'s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. We find that while exploration bonuses lead to higher score on Montezuma's Revenge they do not provide meaningful gains over the simpler $\epsilon$-greedy scheme. In fact, we find that methods that perform best on that game often underperform $\epsilon$-greedy on easy exploration Atari 2600 games. We find that our conclusions remain valid even when hyperparameters are tuned for these easy-exploration games. Finally, we find that none of the methods surveyed benefit from additional training samples (1 billion frames, versus Rainbow's 200 million) on Bellemare et al.'s hard exploration games. Our results suggest that recent gains in Montezuma's Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.

* Published as a conference paper at ICLR 2020
* Full version of arXiv:1908.02388

Via

Access Paper or Ask Questions

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Aug 30, 2021

Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

Figure 1 for Deep Reinforcement Learning at the Edge of the Statistical Precipice

Figure 2 for Deep Reinforcement Learning at the Edge of the Statistical Precipice

Figure 3 for Deep Reinforcement Learning at the Edge of the Statistical Precipice

Figure 4 for Deep Reinforcement Learning at the Edge of the Statistical Precipice

Abstract:Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field.

Via

Access Paper or Ask Questions

Pretraining Representations for Data-Efficient Reinforcement Learning

Jun 09, 2021

Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Charlin, Devon Hjelm, Philip Bachman, Aaron Courville

Figure 1 for Pretraining Representations for Data-Efficient Reinforcement Learning

Figure 2 for Pretraining Representations for Data-Efficient Reinforcement Learning

Figure 3 for Pretraining Representations for Data-Efficient Reinforcement Learning

Figure 4 for Pretraining Representations for Data-Efficient Reinforcement Learning

Abstract:Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder which is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data -- approaching human-level performance and data-efficiency on Atari in our best setting. We provide code associated with this work at https://github.com/mila-iqia/SGI.

Via

Access Paper or Ask Questions

Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

Jun 05, 2021

Dinghuai Zhang, Kartik Ahuja, Yilun Xu, Yisen Wang, Aaron Courville

Figure 1 for Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

Figure 2 for Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

Figure 3 for Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

Figure 4 for Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

Abstract:Can models with particular structure avoid being biased towards spurious correlation in out-of-distribution (OOD) generalization? Peters et al. (2016) provides a positive answer for linear cases. In this paper, we use a functional modular probing method to analyze deep model structures under OOD setting. We demonstrate that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks. Furthermore, we articulate and demonstrate the functional lottery ticket hypothesis: full network contains a subnetwork that can achieve better OOD performance. We then propose Modular Risk Minimization to solve the subnetwork selection problem. Our algorithm learns the subnetwork structure from a given dataset, and can be combined with any other OOD regularization methods. Experiments on various OOD generalization tasks corroborate the effectiveness of our method.

* Accepted to ICML2021 as long talk

Via

Access Paper or Ask Questions

A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Jun 05, 2021

Chin-Wei Huang, Jae Hyun Lim, Aaron Courville

Figure 1 for A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Figure 2 for A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Figure 3 for A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Figure 4 for A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Abstract:Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes that transform data into noise can be reversed via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing a lower bound of the likelihood of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.

Via

Access Paper or Ask Questions