Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Laurent Charlin

University of Toronto

Focused Hierarchical RNNs for Conditional Sequence Processing

Jun 12, 2018

Nan Rosemary Ke, Konrad Zolna, Alessandro Sordoni, Zhouhan Lin, Adam Trischler, Yoshua Bengio, Joelle Pineau, Laurent Charlin, Chris Pal

Figure 1 for Focused Hierarchical RNNs for Conditional Sequence Processing

Figure 2 for Focused Hierarchical RNNs for Conditional Sequence Processing

Figure 3 for Focused Hierarchical RNNs for Conditional Sequence Processing

Figure 4 for Focused Hierarchical RNNs for Conditional Sequence Processing

Abstract:Recurrent Neural Networks (RNNs) with attention mechanisms have obtained state-of-the-art results for many sequence processing tasks. Most of these models use a simple form of encoder with attention that looks over the entire sequence and assigns a weight to each token independently. We present a mechanism for focusing RNN encoders for sequence modelling tasks which allows them to attend to key parts of the input as needed. We formulate this using a multi-layer conditional sequence encoder that reads in one token at a time and makes a discrete decision on whether the token is relevant to the context or question being asked. The discrete gating mechanism takes in the context embedding and the current hidden state as inputs and controls information flow into the layer above. We train it using policy gradient methods. We evaluate this method on several types of tasks with different attributes. First, we evaluate the method on synthetic tasks which allow us to evaluate the model for its generalization ability and probe the behavior of the gates in more controlled settings. We then evaluate this approach on large scale Question Answering tasks including the challenging MS MARCO and SearchQA tasks. Our models shows consistent improvements for both tasks over prior work and our baselines. It has also shown to generalize significantly better on synthetic tasks as compared to the baselines.

* To appear at ICML 2018

Via

Access Paper or Ask Questions

Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

Nov 07, 2017

Nan Rosemary Ke, Anirudh Goyal, Olexa Bilaniuk, Jonathan Binas, Laurent Charlin, Chris Pal, Yoshua Bengio

Figure 1 for Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

Figure 2 for Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

Figure 3 for Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

Figure 4 for Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

Abstract:A major drawback of backpropagation through time (BPTT) is the difficulty of learning long-term dependencies, coming from having to propagate credit information backwards through every single step of the forward computation. This makes BPTT both computationally impractical and biologically implausible. For this reason, full backpropagation through time is rarely used on long sequences, and truncated backpropagation through time is used as a heuristic. However, this usually leads to biased estimates of the gradient in which longer term dependencies are ignored. Addressing this issue, we propose an alternative algorithm, Sparse Attentive Backtracking, which might also be related to principles used by brains to learn long-term dependencies. Sparse Attentive Backtracking learns an attention mechanism over the hidden states of the past and selectively backpropagates through paths with high attention weights. This allows the model to learn long term dependencies while only backtracking for a small number of time steps, not just from the recent past but also from attended relevant past states.

Via

Access Paper or Ask Questions

Learnable Explicit Density for Continuous Latent Space and Variational Inference

Oct 06, 2017

Chin-Wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville

Figure 1 for Learnable Explicit Density for Continuous Latent Space and Variational Inference

Figure 2 for Learnable Explicit Density for Continuous Latent Space and Variational Inference

Figure 3 for Learnable Explicit Density for Continuous Latent Space and Variational Inference

Figure 4 for Learnable Explicit Density for Continuous Latent Space and Variational Inference

Abstract:In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF) and show that with further improvement, inverse AF could be used as universal approximation to any complicated posterior. Our analysis results in a unified approach to parameterizing a VAE, without the need to restrict ourselves to use factorial Gaussians in the latent real space.

* 2 figures, 5 pages, submitted to ICML Principled Approaches to Deep Learning workshop

Via

Access Paper or Ask Questions

A Survey of Available Corpora for Building Data-Driven Dialogue Systems

Mar 21, 2017

Iulian Vlad Serban, Ryan Lowe, Peter Henderson, Laurent Charlin, Joelle Pineau

Figure 1 for A Survey of Available Corpora for Building Data-Driven Dialogue Systems

Abstract:During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective.

* 56 pages including references and appendix, 5 tables and 1 figure; Under review for the Dialogue & Discourse journal. Update: paper has been rewritten and now includes several new datasets

Via

Access Paper or Ask Questions

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

Jan 03, 2017

Chia-Wei Liu, Ryan Lowe, Iulian V. Serban, Michael Noseworthy, Laurent Charlin, Joelle Pineau

Figure 1 for How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

Figure 2 for How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

Figure 3 for How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

Figure 4 for How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

Abstract:We investigate evaluation metrics for dialogue response generation systems where supervised labels, such as task completion, are not available. Recent works in response generation have adopted metrics from machine translation to compare a model's generated response to a single target response. We show that these metrics correlate very weakly with human judgements in the non-technical Twitter domain, and not at all in the technical Ubuntu domain. We provide quantitative and qualitative results highlighting specific weaknesses in existing metrics, and provide recommendations for future development of better automatic evaluation metrics for dialogue systems.

* First 4 authors had equal contribution. 13 pages, 5 tables, 6 figures. EMNLP 2016

Via

Access Paper or Ask Questions

Generative Deep Neural Networks for Dialogue: A Short Review

Nov 18, 2016

Iulian Vlad Serban, Ryan Lowe, Laurent Charlin, Joelle Pineau

Figure 1 for Generative Deep Neural Networks for Dialogue: A Short Review

Figure 2 for Generative Deep Neural Networks for Dialogue: A Short Review

Figure 3 for Generative Deep Neural Networks for Dialogue: A Short Review

Figure 4 for Generative Deep Neural Networks for Dialogue: A Short Review

Abstract:Researchers have recently started investigating deep neural networks for dialogue applications. In particular, generative sequence-to-sequence (Seq2Seq) models have shown promising results for unstructured tasks, such as word-level dialogue response generation. The hope is that such models will be able to leverage massive amounts of data to learn meaningful natural language representations and response generation strategies, while requiring a minimum amount of domain knowledge and hand-crafting. An important challenge is to develop models that can effectively incorporate dialogue context and generate meaningful and diverse responses. In support of this goal, we review recently proposed models based on generative encoder-decoder neural network architectures, and show that these models have better ability to incorporate long-term dialogue history, to model uncertainty and ambiguity in dialogue, and to generate responses with high-level compositional structure.

* 6 pages, 1 figure, 3 tables; NIPS 2016 workshop on Learning Methods for Dialogue

Via

Access Paper or Ask Questions

On the Evaluation of Dialogue Systems with Next Utterance Classification

Jul 23, 2016

Ryan Lowe, Iulian V. Serban, Mike Noseworthy, Laurent Charlin, Joelle Pineau

Figure 1 for On the Evaluation of Dialogue Systems with Next Utterance Classification

Figure 2 for On the Evaluation of Dialogue Systems with Next Utterance Classification

Figure 3 for On the Evaluation of Dialogue Systems with Next Utterance Classification

Abstract:An open challenge in constructing dialogue systems is developing methods for automatically learning dialogue strategies from large amounts of unlabelled data. Recent work has proposed Next-Utterance-Classification (NUC) as a surrogate task for building dialogue systems from text data. In this paper we investigate the performance of humans on this task to validate the relevance of NUC as a method of evaluation. Our results show three main findings: (1) humans are able to correctly classify responses at a rate much better than chance, thus confirming that the task is feasible, (2) human performance levels vary across task domains (we consider 3 datasets) and expertise levels (novice vs experts), thus showing that a range of performance is possible on this type of task, (3) automated dialogue systems built using state-of-the-art machine learning methods have similar performance to the human novices, but worse than the experts, thus confirming the utility of this class of tasks for driving further research in automated dialogue systems.

* Accepted to SIGDIAL 2016 (short paper). 5 pages

Via

Access Paper or Ask Questions

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Jun 14, 2016

Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio

Figure 1 for A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Figure 2 for A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Figure 3 for A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Figure 4 for A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Abstract:Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue. In an effort to model this kind of generative process, we propose a neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with recent neural network architectures. We evaluate the model performance through automatic evaluation metrics and by carrying out a human evaluation. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate the generation of long outputs and maintain the context.

* 15 pages, 5 tables, 4 figures

Via

Access Paper or Ask Questions

Modeling User Exposure in Recommendation

Feb 04, 2016

Dawen Liang, Laurent Charlin, James McInerney, David M. Blei

Figure 1 for Modeling User Exposure in Recommendation

Figure 2 for Modeling User Exposure in Recommendation

Figure 3 for Modeling User Exposure in Recommendation

Figure 4 for Modeling User Exposure in Recommendation

Abstract:Collaborative filtering analyzes user preferences for items (e.g., books, movies, restaurants, academic papers) by exploiting the similarity patterns across users. In implicit feedback settings, all the items, including the ones that a user did not consume, are taken into consideration. But this assumption does not accord with the common sense understanding that users have a limited scope and awareness of items. For example, a user might not have heard of a certain paper, or might live too far away from a restaurant to experience it. In the language of causal analysis, the assignment mechanism (i.e., the items that a user is exposed to) is a latent variable that may change for various user/item combinations. In this paper, we propose a new probabilistic approach that directly incorporates user exposure to items into collaborative filtering. The exposure is modeled as a latent variable and the model infers its value from data. In doing so, we recover one of the most successful state-of-the-art approaches as a special case of our model, and provide a plug-in method for conditioning exposure on various forms of exposure covariates (e.g., topics in text, venue locations). We show that our scalable inference algorithm outperforms existing benchmarks in four different domains both with and without exposure covariates.

* 11 pages, 4 figures. WWW'16

Via

Access Paper or Ask Questions

Dynamic Poisson Factorization

Sep 15, 2015

Laurent Charlin, Rajesh Ranganath, James McInerney, David M. Blei

Figure 1 for Dynamic Poisson Factorization

Figure 2 for Dynamic Poisson Factorization

Figure 3 for Dynamic Poisson Factorization

Figure 4 for Dynamic Poisson Factorization

Abstract:Models for recommender systems use latent factors to explain the preferences and behaviors of users with respect to a set of items (e.g., movies, books, academic papers). Typically, the latent factors are assumed to be static and, given these factors, the observed preferences and behaviors of users are assumed to be generated without order. These assumptions limit the explorative and predictive capabilities of such models, since users' interests and item popularity may evolve over time. To address this, we propose dPF, a dynamic matrix factorization model based on the recent Poisson factorization model for recommendations. dPF models the time evolving latent factors with a Kalman filter and the actions with Poisson distributions. We derive a scalable variational inference algorithm to infer the latent factors. Finally, we demonstrate dPF on 10 years of user click data from arXiv.org, one of the largest repository of scientific papers and a formidable source of information about the behavior of scientists. Empirically we show performance improvement over both static and, more recently proposed, dynamic recommendation models. We also provide a thorough exploration of the inferred posteriors over the latent variables.

* RecSys 2015

Via

Access Paper or Ask Questions