Alert button
Picture for Saizheng Zhang

Saizheng Zhang

Alert button

Personalizing Dialogue Agents: I have a dog, do you have pets too?

Sep 25, 2018
Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston

Figure 1 for Personalizing Dialogue Agents: I have a dog, do you have pets too?
Figure 2 for Personalizing Dialogue Agents: I have a dog, do you have pets too?
Figure 3 for Personalizing Dialogue Agents: I have a dog, do you have pets too?
Figure 4 for Personalizing Dialogue Agents: I have a dog, do you have pets too?

Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.

Viaarxiv icon

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Sep 25, 2018
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning

Figure 1 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Figure 2 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Figure 3 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Figure 4 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems' ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

* EMNLP 2018 long paper. The first three authors contribute equally. Data, code, and blog posts available at https://hotpotqa.github.io/ 
Viaarxiv icon

Neural Models for Key Phrase Detection and Question Generation

May 30, 2018
Sandeep Subramanian, Tong Wang, Xingdi Yuan, Saizheng Zhang, Yoshua Bengio, Adam Trischler

Figure 1 for Neural Models for Key Phrase Detection and Question Generation
Figure 2 for Neural Models for Key Phrase Detection and Question Generation
Figure 3 for Neural Models for Key Phrase Detection and Question Generation

We propose a two-stage neural model to tackle question generation from documents. First, our model estimates the probability that word sequences in a document are ones that a human would pick when selecting candidate answers by training a neural key-phrase extractor on the answers in a question-answering corpus. Predicted key phrases then act as target answers and condition a sequence-to-sequence question-generation model with a copy mechanism. Empirically, our key-phrase extraction model significantly outperforms an entity-tagging baseline and existing rule-based approaches. We further demonstrate that our question generation system formulates fluent, answerable questions from key phrases. This two-stage system could be used to augment or generate reading comprehension datasets, which may be leveraged to improve machine reading systems or in educational settings.

* Machine Reading for Question Answering workshop at ACL 2018 
Viaarxiv icon

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Apr 16, 2018
Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

Figure 1 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent
Figure 2 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent
Figure 3 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent
Figure 4 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment. In this work we propose an interactive learning procedure called Mechanical Turker Descent (MTD) and use it to train agents to execute natural language commands grounded in a fantasy text adventure game. In MTD, Turkers compete to train better agents in the short term, and collaborate by sharing their agents' skills in the long term. This results in a gamified, engaging experience for the Turkers and a better quality teaching signal for the agents compared to static datasets, as the Turkers naturally adapt the training data to the agent's abilities.

Viaarxiv icon

A Deep Reinforcement Learning Chatbot (Short Version)

Jan 20, 2018
Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeswar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

Figure 1 for A Deep Reinforcement Learning Chatbot (Short Version)
Figure 2 for A Deep Reinforcement Learning Chatbot (Short Version)
Figure 3 for A Deep Reinforcement Learning Chatbot (Short Version)

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than other systems. The results highlight the potential of coupling ensemble systems with deep reinforcement learning as a fruitful path for developing real-world, open-domain conversational agents.

* 9 pages, 1 figure, 2 tables; presented at NIPS 2017, Conversational AI: "Today's Practice and Tomorrow's Potential" Workshop 
Viaarxiv icon

A Deep Reinforcement Learning Chatbot

Nov 05, 2017
Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeshwar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

Figure 1 for A Deep Reinforcement Learning Chatbot
Figure 2 for A Deep Reinforcement Learning Chatbot
Figure 3 for A Deep Reinforcement Learning Chatbot
Figure 4 for A Deep Reinforcement Learning Chatbot

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-words models, sequence-to-sequence neural network and latent variable neural network models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than many competing systems. Due to its machine learning architecture, the system is likely to improve with additional data.

* 40 pages, 9 figures, 11 tables 
Viaarxiv icon

Machine Comprehension by Text-to-Text Neural Question Generation

May 15, 2017
Xingdi Yuan, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip Bachman, Sandeep Subramanian, Saizheng Zhang, Adam Trischler

Figure 1 for Machine Comprehension by Text-to-Text Neural Question Generation
Figure 2 for Machine Comprehension by Text-to-Text Neural Question Generation
Figure 3 for Machine Comprehension by Text-to-Text Neural Question Generation
Figure 4 for Machine Comprehension by Text-to-Text Neural Question Generation

We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers. We show how to train the model using a combination of supervised and reinforcement learning. After teacher forcing for standard maximum likelihood training, we fine-tune the model using policy gradient techniques to maximize several rewards that measure question quality. Most notably, one of these rewards is the performance of a question-answering system. We motivate question generation as a means to improve the performance of question answering systems. Our model is trained and evaluated on the recent question-answering dataset SQuAD.

Viaarxiv icon

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Jan 10, 2017
Ying Zhang, Mohammad Pezeshki, Philemon Brakel, Saizheng Zhang, Cesar Laurent Yoshua Bengio, Aaron Courville

Figure 1 for Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Figure 2 for Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Figure 3 for Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Figure 4 for Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an end-to-end speech recognition system instead of hybrid settings. However, RNNs are computationally expensive and sometimes difficult to train. In this paper, inspired by the advantages of both CNNs and the CTC approach, we propose an end-to-end speech framework for sequence labeling, by combining hierarchical CNNs with CTC directly without recurrent connections. By evaluating the approach on the TIMIT phoneme recognition task, we show that the proposed model is not only computationally efficient, but also competitive with the existing baseline systems. Moreover, we argue that CNNs have the capability to model temporal correlations with appropriate context information.

Viaarxiv icon

On Multiplicative Integration with Recurrent Neural Networks

Nov 12, 2016
Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan Salakhutdinov

Figure 1 for On Multiplicative Integration with Recurrent Neural Networks
Figure 2 for On Multiplicative Integration with Recurrent Neural Networks
Figure 3 for On Multiplicative Integration with Recurrent Neural Networks
Figure 4 for On Multiplicative Integration with Recurrent Neural Networks

We introduce a general and simple structural design called Multiplicative Integration (MI) to improve recurrent neural networks (RNNs). MI changes the way in which information from difference sources flows and is integrated in the computational building block of an RNN, while introducing almost no extra parameters. The new structure can be easily embedded into many popular RNN models, including LSTMs and GRUs. We empirically analyze its learning behaviour and conduct evaluations on several tasks using different RNN models. Our experimental results demonstrate that Multiplicative Integration can provide a substantial performance boost over many of the existing RNN models.

* 10 pages, 2 figures; To appear in NIPS2016 
Viaarxiv icon

Architectural Complexity Measures of Recurrent Neural Networks

Nov 12, 2016
Saizheng Zhang, Yuhuai Wu, Tong Che, Zhouhan Lin, Roland Memisevic, Ruslan Salakhutdinov, Yoshua Bengio

Figure 1 for Architectural Complexity Measures of Recurrent Neural Networks
Figure 2 for Architectural Complexity Measures of Recurrent Neural Networks
Figure 3 for Architectural Complexity Measures of Recurrent Neural Networks
Figure 4 for Architectural Complexity Measures of Recurrent Neural Networks

In this paper, we systematically analyze the connecting architectures of recurrent neural networks (RNNs). Our main contribution is twofold: first, we present a rigorous graph-theoretic framework describing the connecting architectures of RNNs in general. Second, we propose three architecture complexity measures of RNNs: (a) the recurrent depth, which captures the RNN's over-time nonlinear complexity, (b) the feedforward depth, which captures the local input-output nonlinearity (similar to the "depth" in feedforward neural networks (FNNs)), and (c) the recurrent skip coefficient which captures how rapidly the information propagates over time. We rigorously prove each measure's existence and computability. Our experimental results show that RNNs might benefit from larger recurrent depth and feedforward depth. We further demonstrate that increasing recurrent skip coefficient offers performance boosts on long term dependency problems.

* 17 pages, 8 figures; To appear in NIPS2016 
Viaarxiv icon