As the wide adoption of intelligent chatbot in human daily life, user demands for such systems evolve from basic task-solving conversations to more casual and friend-like communication. To meet the user needs and build emotional bond with users, it is essential for social chatbots to incorporate more human-like and advanced linguistic features. In this paper, we investigate the usage of a commonly used rhetorical device by human -- metaphor for social chatbot. Our work first designs a metaphor generation framework, which generates topic-aware and novel figurative sentences. By embedding the framework into a chatbot system, we then enables the chatbot to communicate with users using figurative language. Human annotators validate the novelty and properness of the generated metaphors. More importantly, we evaluate the effects of employing metaphors in human-chatbot conversations. Experiments indicate that our system effectively arouses user interests in communicating with our chatbot, resulting in significantly longer human-chatbot conversations.
Interview chatbots engage users in a text-based conversation to draw out their views and opinions. It is, however, challenging to build effective interview chatbots that can handle user free-text responses to open-ended questions and deliver engaging user experience. As the first step, we are investigating the feasibility and effectiveness of using publicly available, practical AI technologies to build effective interview chatbots. To demonstrate feasibility, we built a prototype scoped to enable interview chatbots with a subset of active listening skills - the abilities to comprehend a user's input and respond properly. To evaluate the effectiveness of our prototype, we compared the performance of interview chatbots with or without active listening skills on four common interview topics in a live evaluation with 206 users. Our work presents practical design implications for building effective interview chatbots, hybrid chatbot platforms, and empathetic chatbots beyond interview tasks.
The rise of increasingly more powerful chatbots offers a new way to collect information through conversational surveys, where a chatbot asks open-ended questions, interprets a user's free-text responses, and probes answers when needed. To investigate the effectiveness and limitations of such a chatbot in conducting surveys, we conducted a field study involving about 600 participants. In this study, half of the participants took a typical online survey on Qualtrics and the other half interacted with an AI-powered chatbot to complete a conversational survey. Our detailed analysis of over 5200 free-text responses revealed that the chatbot drove a significantly higher level of participant engagement and elicited significantly better quality responses in terms of relevance, depth, and readability. Based on our results, we discuss design implications for creating AI-powered chatbots to conduct effective surveys and beyond.
Conversational systems have come a long way since their inception in the 1960s. After decades of research and development, we've seen progress from Eliza and Parry in the 60's and 70's, to task-completion systems as in the DARPA Communicator program in the 2000s, to intelligent personal assistants such as Siri in the 2010s, to today's social chatbots like XiaoIce. Social chatbots' appeal lies not only in their ability to respond to users' diverse requests, but also in being able to establish an emotional connection with users. The latter is done by satisfying users' need for communication, affection, as well as social belonging. To further the advancement and adoption of social chatbots, their design must focus on user engagement and take both intellectual quotient (IQ) and emotional quotient (EQ) into account. Users should want to engage with a social chatbot; as such, we define the success metric for social chatbots as conversation-turns per session (CPS). Using XiaoIce as an illustrative example, we discuss key technologies in building social chatbots from core chat to visual awareness to skills. We also show how XiaoIce can dynamically recognize emotion and engage the user throughout long conversations with appropriate interpersonal responses. As we become the first generation of humans ever living with AI, we have a responsibility to design social chatbots to be both useful and empathetic, so they will become ubiquitous and help society as a whole.
A conversational agent (chatbot) is a piece of software that is able to communicate with humans using natural language. Modeling conversation is an important task in natural language processing and artificial intelligence. While chatbots can be used for various tasks, in general they have to understand users' utterances and provide responses that are relevant to the problem at hand. In my work, I conduct an in-depth survey of recent literature, examining over 70 publications related to chatbots published in the last 3 years. Then, I proceed to make the argument that the very nature of the general conversation domain demands approaches that are different from current state-of-of-the-art architectures. Based on several examples from the literature I show why current chatbot models fail to take into account enough priors when generating responses and how this affects the quality of the conversation. In the case of chatbots, these priors can be outside sources of information that the conversation is conditioned on like the persona or mood of the conversers. In addition to presenting the reasons behind this problem, I propose several ideas on how it could be remedied. The next section focuses on adapting the very recent Transformer model to the chatbot domain, which is currently state-of-the-art in neural machine translation. I first present experiments with the vanilla model, using conversations extracted from the Cornell Movie-Dialog Corpus. Secondly, I augment the model with some of my ideas regarding the issues of encoder-decoder architectures. More specifically, I feed additional features into the model like mood or persona together with the raw conversation data. Finally, I conduct a detailed analysis of how the vanilla model performs on conversational data by comparing it to previous chatbot models and how the additional features affect the quality of the generated responses.
Textual conversational agent or chatbots' development gather tremendous traction from both academia and industries in recent years. Nowadays, chatbots are widely used as an agent to communicate with a human in some services such as booking assistant, customer service, and also a personal partner. The biggest challenge in building chatbot is to build a humanizing machine to improve user engagement. Some studies show that emotion is an important aspect to humanize machine, including chatbot. In this paper, we will provide a systematic review of approaches in building an emotionally-aware chatbot (EAC). As far as our knowledge, there is still no work focusing on this area. We propose three research question regarding EAC studies. We start with the history and evolution of EAC, then several approaches to build EAC by previous studies, and some available resources in building EAC. Based on our investigation, we found that in the early development, EAC exploits a simple rule-based approach while now most of EAC use neural-based approach. We also notice that most of EAC contain emotion classifier in their architecture, which utilize several available affective resources. We also predict that the development of EAC will continue to gain more and more attention from scholars, noted by some recent studies propose new datasets for building EAC in various languages.
We propose a new method to detect when users express the intent to leave a service, also known as churn. While previous work focuses solely on social media, we show that this intent can be detected in chatbot conversations. As companies increasingly rely on chatbots they need an overview of potentially churny users. To this end, we crowdsource and publish a dataset of churn intent expressions in chatbot interactions in German and English. We show that classifiers trained on social media data can detect the same intent in the context of chatbots. We introduce a classification architecture that outperforms existing work on churn intent detection in social media. Moreover, we show that, using bilingual word embeddings, a system trained on combined English and German data outperforms monolingual approaches. As the only existing dataset is in English, we crowdsource and publish a novel dataset of German tweets. We thus underline the universal aspect of the problem, as examples of churn intent in English help us identify churn in German tweets and chatbot conversations.
Internet of Things (IoT) is emerging as a significant technology in shaping the future by connecting physical devices or things with internet. It also presents various opportunities for intersection of other technological trends which can allow it to become even more intelligent and efficient. In this paper we focus our attention on the integration of Intelligent Conversational Software Agents or Chatbots with IoT. Literature surveys have looked into various applications, features, underlying technologies and known challenges of IoT. On the other hand, Chatbots are being adopted in greater numbers due to major strides in development of platforms and frameworks. The novelty of this paper lies in the specific integration of Chatbots in the IoT scenario. We analyzed the shortcomings of existing IoT systems and put forward ways to tackle them by incorporating chatbots. A general architecture is proposed for implementing such a system, as well as platforms and frameworks, both commercial and open source, which allow for implementation of such systems. Identification of the newer challenges and possible future directions with this new integration, have also been addressed.
Trainable chatbots that exhibit fluent and human-like conversations remain a big challenge in artificial intelligence. Deep Reinforcement Learning (DRL) is promising for addressing this challenge, but its successful application remains an open question. This article describes a novel ensemble-based approach applied to value-based DRL chatbots, which use finite action sets as a form of meaning representation. In our approach, while dialogue actions are derived from sentence clustering, the training datasets in our ensemble are derived from dialogue clustering. The latter aim to induce specialised agents that learn to interact in a particular style. In order to facilitate neural chatbot training using our proposed approach, we assume dialogue data in raw text only -- without any manually-labelled data. Experimental results using chitchat data reveal that (1) near human-like dialogue policies can be induced, (2) generalisation to unseen data is a difficult problem, and (3) training an ensemble of chatbot agents is essential for improved performance over using a single agent. In addition to evaluations using held-out data, our results are further supported by a human evaluation that rated dialogues in terms of fluency, engagingness and consistency -- which revealed that our proposed dialogue rewards strongly correlate with human judgements.
Much research in computational argumentation assumes that arguments and counterarguments can be obtained in some way. Yet, to improve and apply models of argument, we need methods for acquiring them. Current approaches include argument mining from text, hand coding of arguments by researchers, or generating arguments from knowledge bases. In this paper, we propose a new approach, which we call argument harvesting, that uses a chatbot to enter into a dialogue with a participant to get arguments and counterarguments from him or her. Because it is automated, the chatbot can be used repeatedly in many dialogues, and thereby it can generate a large corpus. We describe the architecture of the chatbot, provide methods for managing a corpus of arguments and counterarguments, and an evaluation of our approach in a case study concerning attitudes of women to participation in sport.
Conversational agents, also known as chatbots, are versatile tools that have the potential of being used in dialogical argumentation. They could possibly be deployed in tasks such as persuasion for behaviour change (e.g. persuading people to eat more fruit, to take regular exercise, etc.) However, to achieve this, there is a need to develop methods for acquiring appropriate arguments and counterargument that reflect both sides of the discussion. For instance, to persuade someone to do regular exercise, the chatbot needs to know counterarguments that the user might have for not doing exercise. To address this need, we present methods for acquiring arguments and counterarguments, and importantly, meta-level information that can be useful for deciding when arguments can be used during an argumentation dialogue. We evaluate these methods in studies with participants and show how harnessing these methods in a chatbot can make it more persuasive.
The development of natural language processing algorithms and the explosive growth of conversational data are encouraging researches on the human-computer conversation. Still, getting qualified conversational data on a large scale is difficult and expensive. In this paper, we verify the feasibility of constructing a data-driven chatbot with processed online community posts by using them as pseudo-conversational data. We argue that chatbots for various purposes can be built extensively through the pipeline exploiting the common structure of community posts. Our experiment demonstrates that chatbots created along the pipeline can yield the proper responses.
A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets to reach the goals. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training chatbots to maximize the longterm return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy guides the conversation towards the final goal by determining some sub-goals, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.
In this paper, we propose Evebot, an innovative, sequence to sequence (Seq2seq) based, fully generative conversational system for the diagnosis of negative emotions and prevention of depression through positively suggestive responses. The system consists of an assembly of deep-learning based models, including Bi-LSTM based model for detecting negative emotions of users and obtaining psychological counselling related corpus for training the chatbot, anti-language sequence to sequence neural network, and maximum mutual information (MMI) model. As adolescents are reluctant to show their negative emotions in physical interaction, traditional methods of emotion analysis and comforting methods may not work. Therefore, this system puts emphasis on using virtual platform to detect signs of depression or anxiety, channel adolescents' stress and mood, and thus prevent the emergence of mental illness. We launched the integrated chatbot system onto an online platform for real-world campus applications. Through a one-month user study, we observe better results in the increase in positivity than other public chatbots in the control group.
We describe and validate a metric for estimating multi-class classifier performance based on cross-validation and adapted for improvement of small, unbalanced natural-language datasets used in chatbot design. Our experiences draw upon building recruitment chatbots that mediate communication between job-seekers and recruiters by exposing the ML/NLP dataset to the recruiting team. Evaluation approaches must be understandable to various stakeholders, and useful for improving chatbot performance. The metric, nex-cv, uses negative examples in the evaluation of text classification, and fulfils three requirements. First, it is actionable: it can be used by non-developer staff. Second, it is not overly optimistic compared to human ratings, making it a fast method for comparing classifiers. Third, it allows model-agnostic comparison, making it useful for comparing systems despite implementation differences. We validate the metric based on seven recruitment-domain datasets in English and German over the course of one year.
Systems powered by artificial intelligence are being developed to be more user-friendly by communicating with users in a progressively human-like conversational way. Chatbots, also known as dialogue systems, interactive conversational agents, or virtual agents are an example of such systems used in a wide variety of applications ranging from customer support in the business domain to companionship in the healthcare sector. It is becoming increasingly important to develop chatbots that can best respond to the personalized needs of their users so that they can be as helpful to the user as possible in a real human way. This paper investigates and compares three popular existing chatbots API offerings and then propose and develop a voice interactive and multilingual chatbot that can effectively respond to users mood, tone, and language using IBM Watson Assistant, Tone Analyzer, and Language Translator. The chatbot was evaluated using a use case that was targeted at responding to users needs regarding exam stress based on university students survey data generated using Google Forms. The results of measuring the chatbot effectiveness at analyzing responses regarding exam stress indicate that the chatbot responding appropriately to the user queries regarding how they are feeling about exams 76.5%. The chatbot could also be adapted for use in other application areas such as student info-centers, government kiosks, and mental health support systems.
Creating open-domain chatbots requires large amounts of conversational data and related benchmark tasks to evaluate them. Standardized evaluation tasks are crucial for creating automatic evaluation metrics for model development; otherwise, comparing the models would require resource-expensive human evaluation. While chatbot challenges have recently managed to provide a plethora of such resources for English, resources in other languages are not yet available. In this work, we provide a starting point for Finnish open-domain chatbot research. We describe our collection efforts to create the Finnish chat conversation corpus FinChat, which is made available publicly. FinChat includes unscripted conversations on seven topics from people of different ages. Using this corpus, we also construct a retrieval-based evaluation task for Finnish chatbot development. We observe that off-the-shelf chatbot models trained on conversational corpora do not perform better than chance at choosing the right answer based on automatic metrics, while humans can do the same task almost perfectly. Similarly, in a human evaluation, responses to questions from the evaluation set generated by the chatbots are predominantly marked as incoherent. Thus, FinChat provides a challenging evaluation set, meant to encourage chatbot development in Finnish.
There is a need for systems to dynamically interact with ageing populations to gather information, monitor health condition and provide support, especially after hospital discharge or at-home settings. Several smart devices have been delivered by digital health, bundled with telemedicine systems, smartphone and other digital services. While such solutions offer personalised data and suggestions, the real disruptive step comes from the interaction of new digital ecosystem, represented by chatbots. Chatbots will play a leading role by embodying the function of a virtual assistant and bridging the gap between patients and clinicians. Powered by AI and machine learning algorithms, chatbots are forecasted to save healthcare costs when used in place of a human or assist them as a preliminary step of helping to assess a condition and providing self-care recommendations. This paper describes integrating chatbots into telemedicine systems intended for elderly patient after their hospital discharge. The paper discusses possible ways to utilise chatbots to assist healthcare providers and support patients with their condition.
Delivery of digital behaviour change interventions which encourage physical activity has been tried in many forms. Most often interventions are delivered as text notifications, but these do not promote interaction. Advances in conversational AI have improved natural language understanding and generation, allowing AI chatbots to provide an engaging experience with the user. For this reason, chatbots have recently been seen in healthcare delivering digital interventions through free text or choice selection. In this work, we explore the use of voice-based AI chatbots as a novel mode of intervention delivery, specifically targeting older adults to encourage physical activity. We co-created "FitChat", an AI chatbot, with older adults and we evaluate the first prototype using Think Aloud Sessions. Our thematic evaluation suggests that older adults prefer voice-based chat over text notifications or free text entry and that voice is a powerful mode for encouraging motivation.
Large end-to-end neural open-domain chatbots are becoming increasingly popular. However, research on building such chatbots has typically assumed that the user input is written in nature and it is not clear whether these chatbots would seamlessly integrate with automatic speech recognition (ASR) models to serve the speech modality. We aim to bring attention to this important question by empirically studying the effects of various types of synthetic and actual ASR hypotheses in the dialog history on TransferTransfo, a state-of-the-art Generative Pre-trained Transformer (GPT) based neural open-domain dialog system from the NeurIPS ConvAI2 challenge. We observe that TransferTransfo trained on written data is very sensitive to such hypotheses introduced to the dialog history during inference time. As a baseline mitigation strategy, we introduce synthetic ASR hypotheses to the dialog history during training and observe marginal improvements, demonstrating the need for further research into techniques to make end-to-end open-domain chatbots fully speech-robust. To the best of our knowledge, this is the first study to evaluate the effects of synthetic and actual ASR hypotheses on a state-of-the-art neural open-domain dialog system and we hope it promotes speech-robustness as an evaluation criterion in open-domain dialog.