Recent advances in deep neural networks, language modeling and language generation have introduced new ideas to the field of conversational agents. As a result, deep neural models such as sequence-to-sequence, Memory Networks, and the Transformer have become key ingredients of state-of-the-art dialog systems. While those models are able to generate meaningful responses even in unseen situation, they need a lot of training data to build a reliable model. Thus, most real-world systems stuck to traditional approaches based on information retrieval and even hand-crafted rules, due to their robustness and effectiveness, especially for narrow-focused conversations. Here, we present a method that adapts a deep neural architecture from the domain of machine reading comprehension to re-rank the suggested answers from different models using the question as context. We train our model using negative sampling based on question-answer pairs from the Twitter Customer Support Dataset.The experimental results show that our re-ranking framework can improve the performance in terms of word overlap and semantics both for individual models as well as for model combinations.
Diversity is a long-studied topic in information retrieval that usually refers to the requirement that retrieved results should be non-repetitive and cover different aspects. In a conversational setting, an additional dimension of diversity matters: an engaging response generation system should be able to output responses that are diverse and interesting. Sequence-to-sequence (Seq2Seq) models have been shown to be very effective for response generation. However, dialogue responses generated by Seq2Seq models tend to have low diversity. In this paper, we review known sources and existing approaches to this low-diversity problem. We also identify a source of low diversity that has been little studied so far, namely model over-confidence. We sketch several directions for tackling model over-confidence and, hence, the low-diversity problem, including confidence penalties and label smoothing.
In this paper, we propose an interactive matching network (IMN) to enhance the representations of contexts and responses at both the word level and sentence level for the multi-turn response selection task. First, IMN constructs word representations from three aspects to address the challenge of out-of-vocabulary (OOV) words. Second, an attentive hierarchical recurrent encoder (AHRE), which is capable of encoding sentences hierarchically and generating more descriptive representations by aggregating with an attention mechanism, is designed. Finally, the bidirectional interactions between whole multi-turn contexts and response candidates are calculated to derive the matching information between them. Experiments on four public datasets show that IMN significantly outperforms the baseline models by large margins on all metrics, achieving new state-of-the-art performance and demonstrating compatibility across domains for multi-turn response selection.
The growing demand for mental health support has highlighted the importance of conversational agents as human supporters worldwide and in China. These agents could increase availability and reduce the relative costs of mental health support. The provided support can be divided into two main types: cognitive and emotional support. Existing work on this topic mainly focuses on constructing agents that adopt Cognitive Behavioral Therapy (CBT) principles. Such agents operate based on pre-defined templates and exercises to provide cognitive support. However, research on emotional support using such agents is limited. In addition, most of the constructed agents operate in English, highlighting the importance of conducting such studies in China. In this study, we analyze the effectiveness of Emohaa in reducing symptoms of mental distress. Emohaa is a conversational agent that provides cognitive support through CBT-based exercises and guided conversations. It also emotionally supports users by enabling them to vent their desired emotional problems. The study included 134 participants, split into three groups: Emohaa (CBT-based), Emohaa (Full), and control. Experimental results demonstrated that compared to the control group, participants who used Emohaa experienced considerably more significant improvements in symptoms of mental distress. We also found that adding the emotional support agent had a complementary effect on such improvements, mainly depression and insomnia. Based on the obtained results and participants' satisfaction with the platform, we concluded that Emohaa is a practical and effective tool for reducing mental distress.
Existing multi-turn context-response matching methods mainly concentrate on obtaining multi-level and multi-dimension representations and better interactions between context utterances and response. However, in real-place conversation scenarios, whether a response candidate is suitable not only counts on the given dialogue context but also other backgrounds, e.g., wording habits, user-specific dialogue history content. To fill the gap between these up-to-date methods and the real-world applications, we incorporate user-specific dialogue history into the response selection and propose a personalized hybrid matching network (PHMN). Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information; 2) we perform hybrid representation learning on context-response utterances and explicitly incorporate a customized attention mechanism to extract vital information from context-response interactions so as to improve the accuracy of matching. We evaluate our model on two large datasets with user identification, i.e., personalized Ubuntu dialogue Corpus (P-Ubuntu) and personalized Weibo dataset (P-Weibo). Experimental results confirm that our method significantly outperforms several strong models by combining personalized attention, wording behaviors, and hybrid representation learning.
Non-task-oriented dialog models suffer from poor quality and non-diverse responses. To overcome limited conversational data, we apply Simulated Multiple Reference Training (SMRT; Khayrallah et al., 2020), and use a paraphraser to simulate multiple responses per training prompt. We find SMRT improves over a strong Transformer baseline as measured by human and automatic quality scores and lexical diversity. We also find SMRT is comparable to pretraining in human evaluation quality, and outperforms pretraining on automatic quality and lexical diversity, without requiring related-domain dialog data.
We consider the importance of different utterances in the context for selecting the response usually depends on the current query. In this paper, we propose the model TripleNet to fully model the task with the triple instead of in previous works. The heart of TripleNet is a novel attention mechanism named triple attention to model the relationships within the triple at four levels. The new mechanism updates the representation for each element based on the attention with the other two concurrently and symmetrically. We match the triple centered on the response from char to context level for prediction. Experimental results on two large-scale multi-turn response selection datasets show that the proposed model can significantly outperform the state-of-the-art methods. TripleNet source code is available at https://github.com/wtma/TripleNet
Natural Language Understanding (NLU) is a core component of dialog systems. It typically involves two tasks - intent classification (IC) and slot labeling (SL), which are then followed by a dialogue management (DM) component. Such NLU systems cater to utterances in isolation, thus pushing the problem of context management to DM. However, contextual information is critical to the correct prediction of intents and slots in a conversation. Prior work on contextual NLU has been limited in terms of the types of contextual signals used and the understanding of their impact on the model. In this work, we propose a context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals, such as previous intents, slots, dialog acts and utterances over a variable context window, in addition to the current user utterance. CASA-NLU outperforms a recurrent contextual NLU baseline on two conversational datasets, yielding a gain of up to 7% on the IC task for one of the datasets. Moreover, a non-contextual variant of CASA-NLU achieves state-of-the-art performance for IC task on standard public datasets - Snips and ATIS.