Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yoshua Bengio

DIRO

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

May 28, 2019

Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio

Figure 1 for N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

Figure 2 for N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

Figure 3 for N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

Figure 4 for N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

Abstract:We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on the well-known M4 competition dataset containing 100k time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on the M4 dataset strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without loss in accuracy.

Via

Access Paper or Ask Questions

State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

May 26, 2019

Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Denis Kazakov, Yoshua Bengio, Michael C. Mozer

Figure 1 for State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

Figure 2 for State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

Figure 3 for State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

Figure 4 for State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

Abstract:Machine learning promises methods that generalize well from finite labeled data. However, the brittleness of existing neural net approaches is revealed by notable failures, such as the existence of adversarial examples that are misclassified despite being nearly identical to a training example, or the inability of recurrent sequence-processing nets to stay on track without teacher forcing. We introduce a method, which we refer to as \emph{state reification}, that involves modeling the distribution of hidden states over the training data and then projecting hidden states observed during testing toward this distribution. Our intuition is that if the network can remain in a familiar manifold of hidden space, subsequent layers of the net should be well trained to respond appropriately. We show that this state-reification method helps neural nets to generalize better, especially when labeled data are sparse, and also helps overcome the challenge of achieving robust generalization with adversarial training.

* ICML 2019 [full oral]. arXiv admin note: text overlap with arXiv:1805.08394

Via

Access Paper or Ask Questions

Compositional generalization in a deep seq2seq model by separating syntax and semantics

May 23, 2019

Jake Russin, Jason Jo, Randall C. O'Reilly, Yoshua Bengio

Figure 1 for Compositional generalization in a deep seq2seq model by separating syntax and semantics

Figure 2 for Compositional generalization in a deep seq2seq model by separating syntax and semantics

Figure 3 for Compositional generalization in a deep seq2seq model by separating syntax and semantics

Figure 4 for Compositional generalization in a deep seq2seq model by separating syntax and semantics

Abstract:Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntactic and semantic processing, we implement a modification to standard approaches in neural machine translation, imposing an analogous separation. The novel model, which we call Syntactic Attention, substantially outperforms standard methods in deep learning on the SCAN dataset, a compositional generalization task, without any hand-engineered features or additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure.

* 18 pages, 15 figures, preprint version of submission to NeurIPS 2019, under review

Via

Access Paper or Ask Questions

The Journey is the Reward: Unsupervised Learning of Influential Trajectories

May 22, 2019

Jonathan Binas, Sherjil Ozair, Yoshua Bengio

Figure 1 for The Journey is the Reward: Unsupervised Learning of Influential Trajectories

Figure 2 for The Journey is the Reward: Unsupervised Learning of Influential Trajectories

Figure 3 for The Journey is the Reward: Unsupervised Learning of Influential Trajectories

Figure 4 for The Journey is the Reward: Unsupervised Learning of Influential Trajectories

Abstract:Unsupervised exploration and representation learning become increasingly important when learning in diverse and sparse environments. The information-theoretic principle of empowerment formalizes an unsupervised exploration objective through an agent trying to maximize its influence on the future states of its environment. Previous approaches carry certain limitations in that they either do not employ closed-loop feedback or do not have an internal state. As a consequence, a privileged final state is taken as an influence measure, rather than the full trajectory. We provide a model-free method which takes into account the whole trajectory while still offering the benefits of option-based approaches. We successfully apply our approach to settings with large action spaces, where discovery of meaningful action sequences is particularly difficult.

* ICML'19 ERL Workshop

Via

Access Paper or Ask Questions

GMNN: Graph Markov Neural Networks

May 15, 2019

Meng Qu, Yoshua Bengio, Jian Tang

Figure 1 for GMNN: Graph Markov Neural Networks

Figure 2 for GMNN: Graph Markov Neural Networks

Figure 3 for GMNN: Graph Markov Neural Networks

Figure 4 for GMNN: Graph Markov Neural Networks

Abstract:This paper studies semi-supervised object classification in relational data, which is a fundamental problem in relational data modeling. The problem has been extensively studied in the literature of both statistical relational learning (e.g. relational Markov networks) and graph neural networks (e.g. graph convolutional networks). Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training. In this paper, we propose the Graph Markov Neural Network (GMNN) that combines the advantages of both worlds. A GMNN models the joint distribution of object labels with a conditional random field, which can be effectively trained with the variational EM algorithm. In the E-step, one graph neural network learns effective object representations for approximating the posterior distributions of object labels. In the M-step, another graph neural network is used to model the local label dependency. Experiments on object classification, link classification, and unsupervised node representation learning show that GMNN achieves state-of-the-art results.

* icml 2019

Via

Access Paper or Ask Questions

Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks

May 02, 2019

Victor Schmidt, Alexandra Luccioni, S. Karthik Mukkavilli, Narmada Balasooriya, Kris Sankaran, Jennifer Chayes, Yoshua Bengio

Figure 1 for Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks

Figure 2 for Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks

Abstract:We present a project that aims to generate images that depict accurate, vivid, and personalized outcomes of climate change using Cycle-Consistent Adversarial Networks (CycleGANs). By training our CycleGAN model on street-view images of houses before and after extreme weather events (e.g. floods, forest fires, etc.), we learn a mapping that can then be applied to images of locations that have not yet experienced these events. This visual transformation is paired with climate model predictions to assess likelihood and type of climate-related events in the long term (50 years) in order to bring the future closer in the viewers mind. The eventual goal of our project is to enable individuals to make more informed choices about their climate future by creating a more visceral understanding of the effects of climate change, while maintaining scientific credibility by drawing on climate model projections.

Via

Access Paper or Ask Questions

GradMask: Reduce Overfitting by Regularizing Saliency

Apr 16, 2019

Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen

Figure 1 for GradMask: Reduce Overfitting by Regularizing Saliency

Figure 2 for GradMask: Reduce Overfitting by Regularizing Saliency

Abstract:With too few samples or too many model parameters, overfitting can inhibit the ability to generalise predictions to new data. Within medical imaging, this can occur when features are incorrectly assigned importance such as distinct hospital specific artifacts, leading to poor performance on a new dataset from a different institution without those features, which is undesirable. Most regularization methods do not explicitly penalize the incorrect association of these features to the target class and hence fail to address this issue. We propose a regularization method, GradMask, which penalizes saliency maps inferred from the classifier gradients when they are not consistent with the lesion segmentation. This prevents non-tumor related features to contribute to the classification of unhealthy samples. We demonstrate that this method can improve test accuracy between 1-3% compared to the baseline without GradMask, showing that it has an impact on reducing overfitting.

Via

Access Paper or Ask Questions

Speech Model Pre-training for End-to-End Spoken Language Understanding

Apr 07, 2019

Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio

Figure 1 for Speech Model Pre-training for End-to-End Spoken Language Understanding

Figure 2 for Speech Model Pre-training for End-to-End Spoken Language Understanding

Figure 3 for Speech Model Pre-training for End-to-End Spoken Language Understanding

Figure 4 for Speech Model Pre-training for End-to-End Spoken Language Understanding

Abstract:Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model. Achieving high accuracy with these end-to-end models without a large amount of training data is difficult. We propose a method to reduce the data requirements of end-to-end SLU in which the model is first pre-trained to predict words and phonemes, thus learning good features for SLU. We introduce a new SLU dataset, Fluent Speech Commands, and show that our method improves performance both when the full dataset is used for training and when only a small subset is used. We also describe preliminary experiments to gauge the model's ability to generalize to new phrases not heard during training.

Via

Access Paper or Ask Questions

Reinforced Imitation in Heterogeneous Action Space

Apr 06, 2019

Konrad Zolna, Negar Rostamzadeh, Yoshua Bengio, Sungjin Ahn, Pedro O. Pinheiro

Figure 1 for Reinforced Imitation in Heterogeneous Action Space

Figure 2 for Reinforced Imitation in Heterogeneous Action Space

Figure 3 for Reinforced Imitation in Heterogeneous Action Space

Figure 4 for Reinforced Imitation in Heterogeneous Action Space

Abstract:Imitation learning is an effective alternative approach to learn a policy when the reward function is sparse. In this paper, we consider a challenging setting where an agent and an expert use different actions from each other. We assume that the agent has access to a sparse reward function and state-only expert observations. We propose a method which gradually balances between the imitation learning cost and the reinforcement learning objective. In addition, this method adapts the agent's policy based on either mimicking expert behavior or maximizing sparse reward. We show, through navigation scenarios, that (i) an agent is able to efficiently leverage sparse rewards to outperform standard state-only imitation learning, (ii) it can learn a policy even when its actions are different from the expert, and (iii) the performance of the agent is not bounded by that of the expert, due to the optimized usage of sparse rewards.

* The extended version of the work "Reinforced Imitation Learning from Observations" presented on the NeurIPS workshop "Imitation Learning and its Challenges in Robotics"

Via

Access Paper or Ask Questions

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Apr 06, 2019

Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio

Figure 1 for Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Figure 2 for Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Figure 3 for Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Figure 4 for Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Abstract:Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some recent works, however, have shown that it is possible to derive useful speech representations by employing a self-supervised encoder-discriminator approach. This paper proposes an improved self-supervised method, where a single neural encoder is followed by multiple workers that jointly solve different self-supervised tasks. The needed consensus across different tasks naturally imposes meaningful constraints to the encoder, contributing to discover general representations and to minimize the risk of learning superficial ones. Experiments show that the proposed approach can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues. In addition, a number of design choices make the encoder easily exportable, facilitating its direct usage or adaptation to different problems.

Via

Access Paper or Ask Questions