Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adam Trischler

Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

Nov 28, 2017
Francis Dutil, Caglar Gulcehre, Adam Trischler, Yoshua Bengio

Figure 1 for Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

Figure 2 for Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

Figure 3 for Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

Figure 4 for Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

We investigate the integration of a planning mechanism into sequence-to-sequence models using attention. We develop a model which can plan ahead in the future when it computes its alignments between input and output sequences, constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by the recently proposed strategic attentive reader and writer (STRAW) model for Reinforcement Learning. Our proposed model is end-to-end trainable using primarily differentiable operations. We show that it outperforms a strong baseline on character-level translation tasks from WMT'15, the algorithmic task of finding Eulerian circuits of graphs, and question generation from the text. Our analysis demonstrates that the model computes qualitatively intuitive alignments, converges faster than the baselines, and achieves superior performance with fewer parameters.

* NIPS 2017

Via

Access Paper or Ask Questions

Variational Bi-LSTMs

Nov 15, 2017
Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio

Recurrent neural networks like long short-term memory (LSTM) are important architectures for sequential prediction tasks. LSTMs (and RNNs in general) model sequences along the forward time direction. Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data. In the training of Bi-LSTMs, the forward and backward paths are learned independently. We propose a variant of the Bi-LSTM architecture, which we call Variational Bi-LSTM, that creates a channel between the two paths (during training, but which may be omitted during inference); thus optimizing the two paths jointly. We arrive at this joint objective for our model by minimizing a variational lower bound of the joint likelihood of the data sequence. Our model acts as a regularizer and encourages the two networks to inform each other in making their respective predictions using distinct information. We perform ablation studies to better understand the different components of our model and evaluate the method on various benchmarks, showing state-of-the-art performance.

Via

Access Paper or Ask Questions

Learning Algorithms for Active Learning

Jul 31, 2017
Philip Bachman, Alessandro Sordoni, Adam Trischler

Figure 1 for Learning Algorithms for Active Learning

Figure 2 for Learning Algorithms for Active Learning

Figure 3 for Learning Algorithms for Active Learning

Figure 4 for Learning Algorithms for Active Learning

We introduce a model that learns active learning algorithms via metalearning. For a distribution of related tasks, our model jointly learns: a data representation, an item selection heuristic, and a method for constructing prediction functions from labeled training sets. Our model uses the item selection heuristic to gather labeled training sets from which to construct prediction functions. Using the Omniglot and MovieLens datasets, we test our model in synthetic and practical settings.

* Accepted for publication at ICML 2017

Via

Access Paper or Ask Questions

Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

Jun 23, 2017
Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

Figure 1 for Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

Figure 2 for Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

Figure 3 for Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

Figure 4 for Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation. We develop a model that plans ahead when it computes alignments between the source and target sequences, constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by the strategic attentive reader and writer (STRAW) model. Our proposed model is end-to-end trainable with fully differentiable operations. We show that it outperforms a strong baseline on three character-level decoder neural machine translation on WMT'15 corpus. Our analysis demonstrates that our model can compute qualitatively intuitive alignments and achieves superior performance with fewer parameters.

* Accepted to Rep4NLP 2017 Workshop at ACL 2017 Conference

Via

Access Paper or Ask Questions

A Joint Model for Question Answering and Question Generation

Jun 05, 2017
Tong Wang, Xingdi Yuan, Adam Trischler

Figure 1 for A Joint Model for Question Answering and Question Generation

Figure 2 for A Joint Model for Question Answering and Question Generation

Figure 3 for A Joint Model for Question Answering and Question Generation

We propose a generative machine comprehension model that learns jointly to ask and answer questions based on documents. The proposed model uses a sequence-to-sequence framework that encodes the document and generates a question (answer) given an answer (question). Significant improvement in model performance is observed empirically on the SQuAD corpus, confirming our hypothesis that the model benefits from jointly learning to perform both tasks. We believe the joint model's novelty offers a new perspective on machine comprehension beyond architectural engineering, and serves as a first step towards autonomous information seeking.

Via

Access Paper or Ask Questions

Machine Comprehension by Text-to-Text Neural Question Generation

May 15, 2017
Xingdi Yuan, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip Bachman, Sandeep Subramanian, Saizheng Zhang, Adam Trischler

Figure 1 for Machine Comprehension by Text-to-Text Neural Question Generation

Figure 2 for Machine Comprehension by Text-to-Text Neural Question Generation

Figure 3 for Machine Comprehension by Text-to-Text Neural Question Generation

Figure 4 for Machine Comprehension by Text-to-Text Neural Question Generation

We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers. We show how to train the model using a combination of supervised and reinforcement learning. After teacher forcing for standard maximum likelihood training, we fine-tune the model using policy gradient techniques to maximize several rewards that measure question quality. Most notably, one of these rewards is the performance of a question-answering system. We motivate question generation as a means to improve the performance of question answering systems. Our model is trained and evaluated on the recent question-answering dataset SQuAD.

Via

Access Paper or Ask Questions

NewsQA: A Machine Comprehension Dataset

Feb 07, 2017
Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, Kaheer Suleman

Figure 1 for NewsQA: A Machine Comprehension Dataset

Figure 2 for NewsQA: A Machine Comprehension Dataset

Figure 3 for NewsQA: A Machine Comprehension Dataset

Figure 4 for NewsQA: A Machine Comprehension Dataset

We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (0.198 in F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available at https://datasets.maluuba.com/NewsQA.

Via

Access Paper or Ask Questions

Towards Information-Seeking Agents

Dec 08, 2016
Philip Bachman, Alessandro Sordoni, Adam Trischler

Figure 1 for Towards Information-Seeking Agents

Figure 2 for Towards Information-Seeking Agents

Figure 3 for Towards Information-Seeking Agents

Figure 4 for Towards Information-Seeking Agents

We develop a general problem setting for training and testing the ability of agents to gather information efficiently. Specifically, we present a collection of tasks in which success requires searching through a partially-observed environment, for fragments of information which can be pieced together to accomplish various goals. We combine deep architectures with techniques from reinforcement learning to develop agents that solve our tasks. We shape the behavior of these agents by combining extrinsic and intrinsic rewards. We empirically demonstrate that these agents learn to search actively and intelligently for new information to reduce their uncertainty, and to exploit information they have already acquired.

* Under review for ICLR 2017

Via

Access Paper or Ask Questions

Iterative Alternating Neural Attention for Machine Reading

Nov 09, 2016
Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio

Figure 1 for Iterative Alternating Neural Attention for Machine Reading

Figure 2 for Iterative Alternating Neural Attention for Machine Reading

Figure 3 for Iterative Alternating Neural Attention for Machine Reading

Figure 4 for Iterative Alternating Neural Attention for Machine Reading

We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document. Unlike previous models, we do not collapse the query into a single vector, instead we deploy an iterative alternating attention mechanism that allows a fine-grained exploration of both the query and the document. Our model outperforms state-of-the-art baselines in standard machine comprehension benchmarks such as CNN news articles and the Children's Book Test (CBT) dataset.

Via

Access Paper or Ask Questions

Natural Language Comprehension with the EpiReader

Jun 10, 2016
Adam Trischler, Zheng Ye, Xingdi Yuan, Kaheer Suleman

Figure 1 for Natural Language Comprehension with the EpiReader

Figure 2 for Natural Language Comprehension with the EpiReader

Figure 3 for Natural Language Comprehension with the EpiReader

Figure 4 for Natural Language Comprehension with the EpiReader

We present the EpiReader, a novel model for machine comprehension of text. Machine comprehension of unstructured, real-world text is a major research goal for natural language processing. Current tests of machine comprehension pose questions whose answers can be inferred from some supporting text, and evaluate a model's response to the questions. The EpiReader is an end-to-end neural model comprising two components: the first component proposes a small set of candidate answers after comparing a question to its supporting text, and the second component formulates hypotheses using the proposed candidates and the question, then reranks the hypotheses based on their estimated concordance with the supporting text. We present experiments demonstrating that the EpiReader sets a new state-of-the-art on the CNN and Children's Book Test machine comprehension benchmarks, outperforming previous neural models by a significant margin.

* 8 pages plus references. Submitted to EMNLP 2016

Via

Access Paper or Ask Questions