Alert button
Picture for Arshit Gupta

Arshit Gupta

Alert button

User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue

Sep 23, 2023
Sam Davidson, Salvatore Romeo, Raphael Shu, James Gung, Arshit Gupta, Saab Mansour, Yi Zhang

Figure 1 for User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue
Figure 2 for User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue
Figure 3 for User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue
Figure 4 for User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue

One of the major impediments to the development of new task-oriented dialogue (TOD) systems is the need for human evaluation at multiple stages and iterations of the development process. In an effort to move toward automated evaluation of TOD, we propose a novel user simulator built using recently developed large pretrained language models (LLMs). In order to increase the linguistic diversity of our system relative to the related previous work, we do not fine-tune the LLMs used by our system on existing TOD datasets; rather we use in-context learning to prompt the LLMs to generate robust and linguistically diverse output with the goal of simulating the behavior of human interlocutors. Unlike previous work, which sought to maximize goal success rate (GSR) as the primary metric of simulator performance, our goal is a system which achieves a GSR similar to that observed in human interactions with TOD systems. Using this approach, our current simulator is effectively able to interact with several TOD systems, especially on single-intent conversational goals, while generating lexically and syntactically diverse output relative to previous simulators that rely upon fine-tuned models. Finally, we collect a Human2Bot dataset of humans interacting with the same TOD systems with which we experimented in order to better quantify these achievements.

* 13 pages 
Viaarxiv icon

NatCS: Eliciting Natural Customer Support Dialogues

May 04, 2023
James Gung, Emily Moeng, Wesley Rose, Arshit Gupta, Yi Zhang, Saab Mansour

Figure 1 for NatCS: Eliciting Natural Customer Support Dialogues
Figure 2 for NatCS: Eliciting Natural Customer Support Dialogues
Figure 3 for NatCS: Eliciting Natural Customer Support Dialogues
Figure 4 for NatCS: Eliciting Natural Customer Support Dialogues

Despite growing interest in applications based on natural customer support conversations, there exist remarkably few publicly available datasets that reflect the expected characteristics of conversations in these settings. Existing task-oriented dialogue datasets, which were collected to benchmark dialogue systems mainly in written human-to-bot settings, are not representative of real customer support conversations and do not provide realistic benchmarks for systems that are applied to natural data. To address this gap, we introduce NatCS, a multi-domain collection of spoken customer service conversations. We describe our process for collecting synthetic conversations between customers and agents based on natural language phenomena observed in real conversations. Compared to previous dialogue datasets, the conversations collected with our approach are more representative of real human-to-human conversations along multiple metrics. Finally, we demonstrate potential uses of NatCS, including dialogue act classification and intent induction from conversations as potential applications, showing that dialogue act annotations in NatCS provide more effective training data for modeling real conversations compared to existing synthetic written datasets. We publicly release NatCS to facilitate research in natural dialog systems

* Accepted to Findings of ACL 2023 
Viaarxiv icon

Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11

Apr 25, 2023
James Gung, Raphael Shu, Emily Moeng, Wesley Rose, Salvatore Romeo, Yassine Benajiba, Arshit Gupta, Saab Mansour, Yi Zhang

Figure 1 for Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11
Figure 2 for Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11
Figure 3 for Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11
Figure 4 for Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11

With increasing demand for and adoption of virtual assistants, recent work has investigated ways to accelerate bot schema design through the automatic induction of intents or the induction of slots and dialogue states. However, a lack of dedicated benchmarks and standardized evaluation has made progress difficult to track and comparisons between systems difficult to make. This challenge track, held as part of the Eleventh Dialog Systems Technology Challenge, introduces a benchmark that aims to evaluate methods for the automatic induction of customer intents in a realistic setting of customer service interactions between human agents and customers. We propose two subtasks for progressively tackling the automatic induction of intents and corresponding evaluation methodologies. We then present three datasets suitable for evaluating the tasks and propose simple baselines. Finally, we summarize the submissions and results of the challenge track, for which we received submissions from 34 teams.

* 18 pages, 1 figure. Accepted at the DSTC 11 Workshop to be located at SIGDIAL 2023 
Viaarxiv icon

Dialog2API: Task-Oriented Dialogue with API Description and Example Programs

Dec 20, 2022
Raphael Shu, Elman Mansimov, Tamer Alkhouli, Nikolaos Pappas, Salvatore Romeo, Arshit Gupta, Saab Mansour, Yi Zhang, Dan Roth

Figure 1 for Dialog2API: Task-Oriented Dialogue with API Description and Example Programs
Figure 2 for Dialog2API: Task-Oriented Dialogue with API Description and Example Programs
Figure 3 for Dialog2API: Task-Oriented Dialogue with API Description and Example Programs
Figure 4 for Dialog2API: Task-Oriented Dialogue with API Description and Example Programs

Functionality and dialogue experience are two important factors of task-oriented dialogue systems. Conventional approaches with closed schema (e.g., conversational semantic parsing) often fail as both the functionality and dialogue experience are strongly constrained by the underlying schema. We introduce a new paradigm for task-oriented dialogue - Dialog2API - to greatly expand the functionality and provide seamless dialogue experience. The conversational model interacts with the environment by generating and executing programs triggering a set of pre-defined APIs. The model also manages the dialogue policy and interact with the user through generating appropriate natural language responses. By allowing generating free-form programs, Dialog2API supports composite goals by combining different APIs, whereas unrestricted program revision provides natural and robust dialogue experience. To facilitate Dialog2API, the core model is provided with API documents, an execution environment and optionally some example dialogues annotated with programs. We propose an approach tailored for the Dialog2API, where the dialogue states are represented by a stack of programs, with most recently mentioned program on the top of the stack. Dialog2API can work with many application scenarios such as software automation and customer service. In this paper, we construct a dataset for AWS S3 APIs and present evaluation results of in-context learning baselines.

Viaarxiv icon

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Sep 29, 2021
Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai, Yi Zhang

Figure 1 for Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System
Figure 2 for Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System
Figure 3 for Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System
Figure 4 for Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Pre-trained language models have been recently shown to benefit task-oriented dialogue (TOD) systems. Despite their success, existing methods often formulate this task as a cascaded generation problem which can lead to error accumulation across different sub-tasks and greater data annotation overhead. In this study, we present PPTOD, a unified plug-and-play model for task-oriented dialogue. In addition, we introduce a new dialogue multi-task pre-training strategy that allows the model to learn the primary TOD task completion skills from heterogeneous dialog corpora. We extensively test our model on three benchmark TOD tasks, including end-to-end dialogue modelling, dialogue state tracking, and intent classification. Experimental results show that PPTOD achieves new state of the art on all evaluated tasks in both high-resource and low-resource scenarios. Furthermore, comparisons against previous SOTA methods show that the responses generated by PPTOD are more factually correct and semantically coherent as judged by human annotators.

Viaarxiv icon

Goal-Embedded Dual Hierarchical Model for Task-Oriented Dialogue Generation

Sep 19, 2019
Yi-An Lai, Arshit Gupta, Yi Zhang

Figure 1 for Goal-Embedded Dual Hierarchical Model for Task-Oriented Dialogue Generation
Figure 2 for Goal-Embedded Dual Hierarchical Model for Task-Oriented Dialogue Generation
Figure 3 for Goal-Embedded Dual Hierarchical Model for Task-Oriented Dialogue Generation
Figure 4 for Goal-Embedded Dual Hierarchical Model for Task-Oriented Dialogue Generation

Hierarchical neural networks are often used to model inherent structures within dialogues. For goal-oriented dialogues, these models miss a mechanism adhering to the goals and neglect the distinct conversational patterns between two interlocutors. In this work, we propose Goal-Embedded Dual Hierarchical Attentional Encoder-Decoder (G-DuHA) able to center around goals and capture interlocutor-level disparity while modeling goal-oriented dialogues. Experiments on dialogue generation, response generation, and human evaluations demonstrate that the proposed model successfully generates higher-quality, more diverse and goal-centric dialogues. Moreover, we apply data augmentation via goal-oriented dialogue generation for task-oriented dialog systems with better performance achieved.

* Accepted by CoNLL-2019 
Viaarxiv icon

CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots

Sep 18, 2019
Arshit Gupta, Peng Zhang, Garima Lalwani, Mona Diab

Figure 1 for CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots
Figure 2 for CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots
Figure 3 for CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots
Figure 4 for CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots

Natural Language Understanding (NLU) is a core component of dialog systems. It typically involves two tasks - intent classification (IC) and slot labeling (SL), which are then followed by a dialogue management (DM) component. Such NLU systems cater to utterances in isolation, thus pushing the problem of context management to DM. However, contextual information is critical to the correct prediction of intents and slots in a conversation. Prior work on contextual NLU has been limited in terms of the types of contextual signals used and the understanding of their impact on the model. In this work, we propose a context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals, such as previous intents, slots, dialog acts and utterances over a variable context window, in addition to the current user utterance. CASA-NLU outperforms a recurrent contextual NLU baseline on two conversational datasets, yielding a gain of up to 7% on the IC task for one of the datasets. Moreover, a non-contextual variant of CASA-NLU achieves state-of-the-art performance for IC task on standard public datasets - Snips and ATIS.

* To appear at EMNLP 2019 
Viaarxiv icon

Simple, Fast, Accurate Intent Classification and Slot Labeling

Mar 19, 2019
Arshit Gupta, John Hewitt, Katrin Kirchhoff

Figure 1 for Simple, Fast, Accurate Intent Classification and Slot Labeling
Figure 2 for Simple, Fast, Accurate Intent Classification and Slot Labeling
Figure 3 for Simple, Fast, Accurate Intent Classification and Slot Labeling
Figure 4 for Simple, Fast, Accurate Intent Classification and Slot Labeling

In real-time dialogue systems running at scale, there is a tradeoff between system performance, time taken for training to converge, and time taken to perform inference. In this work, we study modeling tradeoffs intent classification (IC) and slot labeling (SL), focusing on non-recurrent models. We propose a simple, modular family of neural architectures for joint IC+SL. Using this framework, we explore a number of self-attention, convolutional, and recurrent models, contributing a large-scale analysis of modeling paradigms for IC+SL across two datasets. At the same time, we discuss a class of 'label-recurrent' models, proposing that otherwise non-recurrent models with a 10-dimensional representation of the label history provide multi-point SL improvements. As a result of our analysis, we propose a class of label-recurrent, dilated, convolutional IC+SL systems that are accurate, achieving a 30% error reduction in SL over the state-of-the-art performance on the Snips dataset, as well as fast, at 2x the inference and 2/3 to 1/2 the training time of comparable recurrent models.

Viaarxiv icon