Alert button
Picture for Iulian V. Serban

Iulian V. Serban

Alert button

Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems

Mar 13, 2021
Matt Grenander, Robert Belfer, Ekaterina Kochmar, Iulian V. Serban, François St-Hilaire, Jackie C. K. Cheung

Figure 1 for Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems
Figure 2 for Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems
Figure 3 for Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems
Figure 4 for Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems

We explore creating automated, personalized feedback in an intelligent tutoring system (ITS). Our goal is to pinpoint correct and incorrect concepts in student answers in order to achieve better student learning gains. Although automatic methods for providing personalized feedback exist, they do not explicitly inform students about which concepts in their answers are correct or incorrect. Our approach involves decomposing students answers using neural discourse segmentation and classification techniques. This decomposition yields a relational graph over all discourse units covered by the reference solutions and student answers. We use this inferred relational graph structure and a neural classifier to match student answers with reference solutions and generate personalized feedback. Although the process is completely automated and data-driven, the personalized feedback generated is highly contextual, domain-aware and effectively targets each student's misconceptions and knowledge gaps. We test our method in a dialogue-based ITS and demonstrate that our approach results in high-quality feedback and significantly improved student learning gains.

* Accepted at EAAI 2021 
Viaarxiv icon

A Deep Reinforcement Learning Chatbot (Short Version)

Jan 20, 2018
Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeswar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

Figure 1 for A Deep Reinforcement Learning Chatbot (Short Version)
Figure 2 for A Deep Reinforcement Learning Chatbot (Short Version)
Figure 3 for A Deep Reinforcement Learning Chatbot (Short Version)

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than other systems. The results highlight the potential of coupling ensemble systems with deep reinforcement learning as a fruitful path for developing real-world, open-domain conversational agents.

* 9 pages, 1 figure, 2 tables; presented at NIPS 2017, Conversational AI: "Today's Practice and Tomorrow's Potential" Workshop 
Viaarxiv icon

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

Jan 16, 2018
Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, Joelle Pineau

Figure 1 for Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Figure 2 for Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Figure 3 for Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Figure 4 for Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response quality. Yet having an accurate automatic evaluation procedure is crucial for dialogue research, as it allows rapid prototyping and testing of new models with fewer expensive human evaluations. In response to this challenge, we formulate automatic dialogue evaluation as a learning problem. We present an evaluation model (ADEM) that learns to predict human-like scores to input responses, using a new dataset of human response scores. We show that the ADEM model's predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. We also show that ADEM can generalize to evaluating dialogue models unseen during training, an important step for automatic dialogue evaluation.

* Proceedings of the 55th annual meeting on Association for Computational Linguistics (2017), pp. 1116-1126  
* ACL 2017 
Viaarxiv icon

A Deep Reinforcement Learning Chatbot

Nov 05, 2017
Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeshwar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

Figure 1 for A Deep Reinforcement Learning Chatbot
Figure 2 for A Deep Reinforcement Learning Chatbot
Figure 3 for A Deep Reinforcement Learning Chatbot
Figure 4 for A Deep Reinforcement Learning Chatbot

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-words models, sequence-to-sequence neural network and latent variable neural network models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than many competing systems. Due to its machine learning architecture, the system is likely to improve with additional data.

* 40 pages, 9 figures, 11 tables 
Viaarxiv icon

Piecewise Latent Variables for Neural Variational Text Processing

Sep 23, 2017
Iulian V. Serban, Alexander G. Ororbia II, Joelle Pineau, Aaron Courville

Figure 1 for Piecewise Latent Variables for Neural Variational Text Processing
Figure 2 for Piecewise Latent Variables for Neural Variational Text Processing
Figure 3 for Piecewise Latent Variables for Neural Variational Text Processing
Figure 4 for Piecewise Latent Variables for Neural Variational Text Processing

Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.

* 19 pages, 2 figures, 8 tables; EMNLP 2017 
Viaarxiv icon

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

Jan 03, 2017
Chia-Wei Liu, Ryan Lowe, Iulian V. Serban, Michael Noseworthy, Laurent Charlin, Joelle Pineau

Figure 1 for How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
Figure 2 for How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
Figure 3 for How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
Figure 4 for How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

We investigate evaluation metrics for dialogue response generation systems where supervised labels, such as task completion, are not available. Recent works in response generation have adopted metrics from machine translation to compare a model's generated response to a single target response. We show that these metrics correlate very weakly with human judgements in the non-technical Twitter domain, and not at all in the technical Ubuntu domain. We provide quantitative and qualitative results highlighting specific weaknesses in existing metrics, and provide recommendations for future development of better automatic evaluation metrics for dialogue systems.

* First 4 authors had equal contribution. 13 pages, 5 tables, 6 figures. EMNLP 2016 
Viaarxiv icon

On the Evaluation of Dialogue Systems with Next Utterance Classification

Jul 23, 2016
Ryan Lowe, Iulian V. Serban, Mike Noseworthy, Laurent Charlin, Joelle Pineau

Figure 1 for On the Evaluation of Dialogue Systems with Next Utterance Classification
Figure 2 for On the Evaluation of Dialogue Systems with Next Utterance Classification
Figure 3 for On the Evaluation of Dialogue Systems with Next Utterance Classification

An open challenge in constructing dialogue systems is developing methods for automatically learning dialogue strategies from large amounts of unlabelled data. Recent work has proposed Next-Utterance-Classification (NUC) as a surrogate task for building dialogue systems from text data. In this paper we investigate the performance of humans on this task to validate the relevance of NUC as a method of evaluation. Our results show three main findings: (1) humans are able to correctly classify responses at a rate much better than chance, thus confirming that the task is feasible, (2) human performance levels vary across task domains (we consider 3 datasets) and expertise levels (novice vs experts), thus showing that a range of performance is possible on this type of task, (3) automated dialogue systems built using state-of-the-art machine learning methods have similar performance to the human novices, but worse than the experts, thus confirming the utility of this class of tasks for driving further research in automated dialogue systems.

* Accepted to SIGDIAL 2016 (short paper). 5 pages 
Viaarxiv icon

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

Apr 06, 2016
Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau

Figure 1 for Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
Figure 2 for Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
Figure 3 for Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
Figure 4 for Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models. Generative models produce system responses that are autonomously generated word-by-word, opening up the possibility for realistic, flexible interactions. In support of this goal, we extend the recently proposed hierarchical recurrent encoder-decoder neural network to the dialogue domain, and demonstrate that this model is competitive with state-of-the-art neural language models and back-off n-gram models. We investigate the limitations of this and similar approaches, and show how its performance can be improved by bootstrapping the learning from a larger question-answer pair corpus and from pretrained word embeddings.

* 8 pages with references; Published in AAAI 2016 (Special Track on Cognitive Systems) 
Viaarxiv icon