Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Saizheng Zhang

Personalizing Dialogue Agents: I have a dog, do you have pets too?

Sep 25, 2018

Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston

Figure 1 for Personalizing Dialogue Agents: I have a dog, do you have pets too?

Figure 2 for Personalizing Dialogue Agents: I have a dog, do you have pets too?

Figure 3 for Personalizing Dialogue Agents: I have a dog, do you have pets too?

Figure 4 for Personalizing Dialogue Agents: I have a dog, do you have pets too?

Abstract:Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.

Via

Access Paper or Ask Questions

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Sep 25, 2018

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning

Figure 1 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Figure 2 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Figure 3 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Figure 4 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Abstract:Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems' ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

* EMNLP 2018 long paper. The first three authors contribute equally. Data, code, and blog posts available at https://hotpotqa.github.io/

Via

Access Paper or Ask Questions

Neural Models for Key Phrase Detection and Question Generation

May 30, 2018

Sandeep Subramanian, Tong Wang, Xingdi Yuan, Saizheng Zhang, Yoshua Bengio, Adam Trischler

Figure 1 for Neural Models for Key Phrase Detection and Question Generation

Figure 2 for Neural Models for Key Phrase Detection and Question Generation

Figure 3 for Neural Models for Key Phrase Detection and Question Generation

Abstract:We propose a two-stage neural model to tackle question generation from documents. First, our model estimates the probability that word sequences in a document are ones that a human would pick when selecting candidate answers by training a neural key-phrase extractor on the answers in a question-answering corpus. Predicted key phrases then act as target answers and condition a sequence-to-sequence question-generation model with a copy mechanism. Empirically, our key-phrase extraction model significantly outperforms an entity-tagging baseline and existing rule-based approaches. We further demonstrate that our question generation system formulates fluent, answerable questions from key phrases. This two-stage system could be used to augment or generate reading comprehension datasets, which may be leveraged to improve machine reading systems or in educational settings.

* Machine Reading for Question Answering workshop at ACL 2018

Via

Access Paper or Ask Questions

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Apr 16, 2018

Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

Figure 1 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Figure 2 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Figure 3 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Figure 4 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Abstract:Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment. In this work we propose an interactive learning procedure called Mechanical Turker Descent (MTD) and use it to train agents to execute natural language commands grounded in a fantasy text adventure game. In MTD, Turkers compete to train better agents in the short term, and collaborate by sharing their agents' skills in the long term. This results in a gamified, engaging experience for the Turkers and a better quality teaching signal for the agents compared to static datasets, as the Turkers naturally adapt the training data to the agent's abilities.

Via

Access Paper or Ask Questions

A Deep Reinforcement Learning Chatbot (Short Version)

Jan 20, 2018

Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke(+8 more)

Figure 1 for A Deep Reinforcement Learning Chatbot (Short Version)

Figure 2 for A Deep Reinforcement Learning Chatbot (Short Version)

Figure 3 for A Deep Reinforcement Learning Chatbot (Short Version)

Abstract:We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than other systems. The results highlight the potential of coupling ensemble systems with deep reinforcement learning as a fruitful path for developing real-world, open-domain conversational agents.

* 9 pages, 1 figure, 2 tables; presented at NIPS 2017, Conversational AI: "Today's Practice and Tomorrow's Potential" Workshop

Via

Access Paper or Ask Questions

A Deep Reinforcement Learning Chatbot

Nov 05, 2017

Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke(+8 more)

Figure 1 for A Deep Reinforcement Learning Chatbot

Figure 2 for A Deep Reinforcement Learning Chatbot

Figure 3 for A Deep Reinforcement Learning Chatbot

Figure 4 for A Deep Reinforcement Learning Chatbot

Abstract:We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-words models, sequence-to-sequence neural network and latent variable neural network models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than many competing systems. Due to its machine learning architecture, the system is likely to improve with additional data.

* 40 pages, 9 figures, 11 tables

Via

Access Paper or Ask Questions

Machine Comprehension by Text-to-Text Neural Question Generation

May 15, 2017

Xingdi Yuan, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip Bachman, Sandeep Subramanian, Saizheng Zhang, Adam Trischler

Figure 1 for Machine Comprehension by Text-to-Text Neural Question Generation

Figure 2 for Machine Comprehension by Text-to-Text Neural Question Generation

Figure 3 for Machine Comprehension by Text-to-Text Neural Question Generation

Figure 4 for Machine Comprehension by Text-to-Text Neural Question Generation

Abstract:We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers. We show how to train the model using a combination of supervised and reinforcement learning. After teacher forcing for standard maximum likelihood training, we fine-tune the model using policy gradient techniques to maximize several rewards that measure question quality. Most notably, one of these rewards is the performance of a question-answering system. We motivate question generation as a means to improve the performance of question answering systems. Our model is trained and evaluated on the recent question-answering dataset SQuAD.

Via

Access Paper or Ask Questions

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Jan 10, 2017

Ying Zhang, Mohammad Pezeshki, Philemon Brakel, Saizheng Zhang, Cesar Laurent Yoshua Bengio, Aaron Courville

Figure 1 for Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Figure 2 for Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Figure 3 for Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Figure 4 for Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Abstract:Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an end-to-end speech recognition system instead of hybrid settings. However, RNNs are computationally expensive and sometimes difficult to train. In this paper, inspired by the advantages of both CNNs and the CTC approach, we propose an end-to-end speech framework for sequence labeling, by combining hierarchical CNNs with CTC directly without recurrent connections. By evaluating the approach on the TIMIT phoneme recognition task, we show that the proposed model is not only computationally efficient, but also competitive with the existing baseline systems. Moreover, we argue that CNNs have the capability to model temporal correlations with appropriate context information.

Via

Access Paper or Ask Questions

On Multiplicative Integration with Recurrent Neural Networks

Nov 12, 2016

Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan Salakhutdinov

Figure 1 for On Multiplicative Integration with Recurrent Neural Networks

Figure 2 for On Multiplicative Integration with Recurrent Neural Networks

Figure 3 for On Multiplicative Integration with Recurrent Neural Networks

Figure 4 for On Multiplicative Integration with Recurrent Neural Networks

Abstract:We introduce a general and simple structural design called Multiplicative Integration (MI) to improve recurrent neural networks (RNNs). MI changes the way in which information from difference sources flows and is integrated in the computational building block of an RNN, while introducing almost no extra parameters. The new structure can be easily embedded into many popular RNN models, including LSTMs and GRUs. We empirically analyze its learning behaviour and conduct evaluations on several tasks using different RNN models. Our experimental results demonstrate that Multiplicative Integration can provide a substantial performance boost over many of the existing RNN models.

* 10 pages, 2 figures; To appear in NIPS2016

Via

Access Paper or Ask Questions

Architectural Complexity Measures of Recurrent Neural Networks

Nov 12, 2016

Saizheng Zhang, Yuhuai Wu, Tong Che, Zhouhan Lin, Roland Memisevic, Ruslan Salakhutdinov, Yoshua Bengio

Figure 1 for Architectural Complexity Measures of Recurrent Neural Networks

Figure 2 for Architectural Complexity Measures of Recurrent Neural Networks

Figure 3 for Architectural Complexity Measures of Recurrent Neural Networks

Figure 4 for Architectural Complexity Measures of Recurrent Neural Networks

Abstract:In this paper, we systematically analyze the connecting architectures of recurrent neural networks (RNNs). Our main contribution is twofold: first, we present a rigorous graph-theoretic framework describing the connecting architectures of RNNs in general. Second, we propose three architecture complexity measures of RNNs: (a) the recurrent depth, which captures the RNN's over-time nonlinear complexity, (b) the feedforward depth, which captures the local input-output nonlinearity (similar to the "depth" in feedforward neural networks (FNNs)), and (c) the recurrent skip coefficient which captures how rapidly the information propagates over time. We rigorously prove each measure's existence and computability. Our experimental results show that RNNs might benefit from larger recurrent depth and feedforward depth. We further demonstrate that increasing recurrent skip coefficient offers performance boosts on long term dependency problems.

* 17 pages, 8 figures; To appear in NIPS2016

Via

Access Paper or Ask Questions