Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dzmitry Bahdanau

Systematic Generalization: What Is Required and Can It Be Learned?

Nov 30, 2018

Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville

Figure 1 for Systematic Generalization: What Is Required and Can It Be Learned?

Figure 2 for Systematic Generalization: What Is Required and Can It Be Learned?

Figure 3 for Systematic Generalization: What Is Required and Can It Be Learned?

Figure 4 for Systematic Generalization: What Is Required and Can It Be Learned?

Abstract:Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task with little adaptation and (ii) intuitively appealing modular models that require background knowledge to be instantiated. We compare both types of models in how much they lend themselves to a particular form of systematic generalization. Using a synthetic VQA test, we evaluate which models are capable of reasoning about all possible object pairs after training on only a small subset of them. Our findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected. We furthermore investigate if modular models that generalize well could be made more end-to-end by learning their layout and parametrization. We find that end-to-end methods from prior work often learn a wrong layout and a spurious parametrization that do not facilitate systematic generalization. Our results suggest that, in addition to modularity, systematic generalization in language understanding may require explicit regularizers or priors.

* Preprint. Work in progress

Via

Access Paper or Ask Questions

BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop

Oct 27, 2018

Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio

Figure 1 for BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop

Figure 2 for BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop

Figure 3 for BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop

Figure 4 for BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop

Abstract:Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded language learning. The BabyAI platform comprises an extensible suite of 19 levels of increasing difficulty. The levels gradually lead the agent towards acquiring a combinatorially rich synthetic language which is a proper subset of English. The platform also provides a heuristic expert agent for the purpose of simulating a human teacher. We report baseline results and estimate the amount of human involvement that would be required to train a neural network-based agent on some of the BabyAI levels. We put forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties.

* Preprint. Work in progress

Via

Access Paper or Ask Questions

Learning to Understand Goal Specifications by Modelling Reward

Oct 02, 2018

Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Pushmeet Kohli, Edward Grefenstette

Figure 1 for Learning to Understand Goal Specifications by Modelling Reward

Figure 2 for Learning to Understand Goal Specifications by Modelling Reward

Figure 3 for Learning to Understand Goal Specifications by Modelling Reward

Figure 4 for Learning to Understand Goal Specifications by Modelling Reward

Abstract:Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, this places on environment designers the onus of designing language-conditional reward functions which may not be easily or tractably implemented as the complexity of the environment and the language scales. To overcome this limitation, we present a framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples. As reward models improve, they learn to accurately reward agents for completing tasks for environment configurations---and for instructions---not present amongst the expert data. This framework effectively separates the representation of what instructions require from how they can be executed. In a simple grid world, it enables an agent to learn a range of commands requiring interaction with blocks and understanding of spatial relations and underspecified abstract arrangements. We further show the method allows our agent to adapt to changes in the environment without requiring new expert examples.

* 18 pages, 8 figures

Via

Access Paper or Ask Questions

Commonsense mining as knowledge base completion? A study on the impact of novelty

Apr 24, 2018

Stanisław Jastrzębski, Dzmitry Bahdanau, Seyedarian Hosseini, Michael Noukhovitch, Yoshua Bengio, Jackie Chi Kit Cheung

Figure 1 for Commonsense mining as knowledge base completion? A study on the impact of novelty

Figure 2 for Commonsense mining as knowledge base completion? A study on the impact of novelty

Figure 3 for Commonsense mining as knowledge base completion? A study on the impact of novelty

Figure 4 for Commonsense mining as knowledge base completion? A study on the impact of novelty

Abstract:Commonsense knowledge bases such as ConceptNet represent knowledge in the form of relational triples. Inspired by the recent work by Li et al., we analyse if knowledge base completion models can be used to mine commonsense knowledge from raw text. We propose novelty of predicted triples with respect to the training set as an important factor in interpreting results. We critically analyse the difficulty of mining novel commonsense knowledge, and show that a simple baseline method outperforms the previous state of the art on predicting more novel.

* Published in Workshop on New Forms of Generalization in Deep Learning and Natural Language Processing (NAACL 2018)

Via

Access Paper or Ask Questions

Learning to Compute Word Embeddings On the Fly

Mar 07, 2018

Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio

Figure 1 for Learning to Compute Word Embeddings On the Fly

Figure 2 for Learning to Compute Word Embeddings On the Fly

Figure 3 for Learning to Compute Word Embeddings On the Fly

Figure 4 for Learning to Compute Word Embeddings On the Fly

Abstract:Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the "long tail" of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words with a unique representation. We provide a method for predicting embeddings of rare words on the fly from small amounts of auxiliary data with a network trained end-to-end for the downstream task. We show that this improves results against baselines where embeddings are trained on the end task for reading comprehension, recognizing textual entailment and language modeling.

Via

Access Paper or Ask Questions

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Oct 16, 2017

Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck

Figure 1 for Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Figure 2 for Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Figure 3 for Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Figure 4 for Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Abstract:This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (MLE), and the probability distribution over the next token in the sequence learned by this model is treated as a prior policy. Another RNN is then trained using reinforcement learning (RL) to generate higher-quality outputs that account for domain-specific incentives while retaining proximity to the prior policy of the MLE RNN. To formalize this objective, we derive novel off-policy RL methods for RNNs from KL-control. The effectiveness of the approach is demonstrated on two applications; 1) generating novel musical melodies, and 2) computational molecular generation. For both problems, we show that the proposed method improves the desired properties and structure of the generated sequences, while maintaining information learned from data.

* Add supplementary material

Via

Access Paper or Ask Questions

An Actor-Critic Algorithm for Sequence Prediction

Mar 03, 2017

Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

Figure 1 for An Actor-Critic Algorithm for Sequence Prediction

Figure 2 for An Actor-Critic Algorithm for Sequence Prediction

Abstract:We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \textit{critic} network that is trained to predict the value of an output token, given the policy of an \textit{actor} network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.

Via

Access Paper or Ask Questions

Neural Machine Translation by Jointly Learning to Align and Translate

May 19, 2016

Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

Figure 1 for Neural Machine Translation by Jointly Learning to Align and Translate

Figure 2 for Neural Machine Translation by Jointly Learning to Align and Translate

Figure 3 for Neural Machine Translation by Jointly Learning to Align and Translate

Figure 4 for Neural Machine Translation by Jointly Learning to Align and Translate

Abstract:Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

* Accepted at ICLR 2015 as oral presentation

Via

Access Paper or Ask Questions

Theano: A Python framework for fast computation of mathematical expressions

May 09, 2016

The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov(+103 more)

Figure 1 for Theano: A Python framework for fast computation of mathematical expressions

Figure 2 for Theano: A Python framework for fast computation of mathematical expressions

Figure 3 for Theano: A Python framework for fast computation of mathematical expressions

Figure 4 for Theano: A Python framework for fast computation of mathematical expressions

Abstract:Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

* 19 pages, 5 figures

Via

Access Paper or Ask Questions

End-to-End Attention-based Large Vocabulary Speech Recognition

Mar 14, 2016

Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio

Figure 1 for End-to-End Attention-based Large Vocabulary Speech Recognition

Figure 2 for End-to-End Attention-based Large Vocabulary Speech Recognition

Figure 3 for End-to-End Attention-based Large Vocabulary Speech Recognition

Figure 4 for End-to-End Attention-based Large Vocabulary Speech Recognition

Abstract:Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs). Most of these systems contain separate components that deal with the acoustic modelling, language modelling and sequence decoding. We investigate a more direct approach in which the HMM is replaced with a Recurrent Neural Network (RNN) that performs sequence prediction directly at the character level. Alignment between the input features and the desired character sequence is learned automatically by an attention mechanism built into the RNN. For each predicted character, the attention mechanism scans the input sequence and chooses relevant frames. We propose two methods to speed up this operation: limiting the scan to a subset of most promising frames and pooling over time the information contained in neighboring frames, thereby reducing source sequence length. Integrating an n-gram language model into the decoding process yields recognition accuracies similar to other HMM-free RNN-based approaches.

Via

Access Paper or Ask Questions