Alert button
Picture for Raphael Shu

Raphael Shu

Alert button

DiactTOD: Learning Generalizable Latent Dialogue Acts for Controllable Task-Oriented Dialogue Systems

Aug 01, 2023
Qingyang Wu, James Gung, Raphael Shu, Yi Zhang

Dialogue act annotations are important to improve response generation quality in task-oriented dialogue systems. However, it can be challenging to use dialogue acts to control response generation in a generalizable way because different datasets and tasks may have incompatible annotations. While alternative methods that utilize latent action spaces or reinforcement learning do not require explicit annotations, they may lack interpretability or face difficulties defining task-specific rewards. In this work, we present a novel end-to-end latent dialogue act model (DiactTOD) that represents dialogue acts in a latent space. DiactTOD, when pre-trained on a large corpus, is able to predict and control dialogue acts to generate controllable responses using these latent representations in a zero-shot fashion. Our approach demonstrates state-of-the-art performance across a wide range of experimental settings on the MultiWOZ dataset, including zero-shot, few-shot, and full data fine-tuning with both end-to-end and policy optimization configurations.

* SIGDial 2023 
Viaarxiv icon

Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification

May 24, 2023
Mujeen Sung, James Gung, Elman Mansimov, Nikolaos Pappas, Raphael Shu, Salvatore Romeo, Yi Zhang, Vittorio Castelli

Figure 1 for Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification
Figure 2 for Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification
Figure 3 for Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification
Figure 4 for Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification

Intent classification (IC) plays an important role in task-oriented dialogue systems as it identifies user intents from given utterances. However, models trained on limited annotations for IC often suffer from a lack of generalization to unseen intent classes. We propose a novel pre-training method for text encoders that uses contrastive learning with intent psuedo-labels to produce embeddings that are well-suited for IC tasks. By applying this pre-training strategy, we also introduce the pre-trained intent-aware encoder (PIE). Specifically, we first train a tagger to identify key phrases within utterances that are crucial for interpreting intents. We then use these extracted phrases to create examples for pre-training a text encoder in a contrastive manner. As a result, our PIE model achieves up to 5.4% and 4.0% higher accuracy than the previous state-of-the-art pre-trained sentence encoder for the N-way zero- and one-shot settings on four IC datasets.

Viaarxiv icon

Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11

Apr 25, 2023
James Gung, Raphael Shu, Emily Moeng, Wesley Rose, Salvatore Romeo, Yassine Benajiba, Arshit Gupta, Saab Mansour, Yi Zhang

Figure 1 for Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11
Figure 2 for Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11
Figure 3 for Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11
Figure 4 for Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11

With increasing demand for and adoption of virtual assistants, recent work has investigated ways to accelerate bot schema design through the automatic induction of intents or the induction of slots and dialogue states. However, a lack of dedicated benchmarks and standardized evaluation has made progress difficult to track and comparisons between systems difficult to make. This challenge track, held as part of the Eleventh Dialog Systems Technology Challenge, introduces a benchmark that aims to evaluate methods for the automatic induction of customer intents in a realistic setting of customer service interactions between human agents and customers. We propose two subtasks for progressively tackling the automatic induction of intents and corresponding evaluation methodologies. We then present three datasets suitable for evaluating the tasks and propose simple baselines. Finally, we summarize the submissions and results of the challenge track, for which we received submissions from 34 teams.

* 18 pages, 1 figure. Accepted at the DSTC 11 Workshop to be located at SIGDIAL 2023 
Viaarxiv icon

Conversation Style Transfer using Few-Shot Learning

Feb 16, 2023
Shamik Roy, Raphael Shu, Nikolaos Pappas, Elman Mansimov, Yi Zhang, Saab Mansour, Dan Roth

Figure 1 for Conversation Style Transfer using Few-Shot Learning
Figure 2 for Conversation Style Transfer using Few-Shot Learning
Figure 3 for Conversation Style Transfer using Few-Shot Learning
Figure 4 for Conversation Style Transfer using Few-Shot Learning

Conventional text style transfer approaches for natural language focus on sentence-level style transfer without considering contextual information, and the style is described with attributes (e.g., formality). When applying style transfer on conversations such as task-oriented dialogues, existing approaches suffer from these limitations as context can play an important role and the style attributes are often difficult to define in conversations. In this paper, we introduce conversation style transfer as a few-shot learning problem, where the model learns to perform style transfer by observing only the target-style dialogue examples. We propose a novel in-context learning approach to solve the task with style-free dialogues as a pivot. Human evaluation shows that by incorporating multi-turn context, the model is able to match the target style while having better appropriateness and semantic correctness compared to utterance-level style transfer. Additionally, we show that conversation style transfer can also benefit downstream tasks. Results on multi-domain intent classification tasks show improvement in F1 scores after transferring the style of training data to match the style of test data.

Viaarxiv icon

Dialog2API: Task-Oriented Dialogue with API Description and Example Programs

Dec 20, 2022
Raphael Shu, Elman Mansimov, Tamer Alkhouli, Nikolaos Pappas, Salvatore Romeo, Arshit Gupta, Saab Mansour, Yi Zhang, Dan Roth

Figure 1 for Dialog2API: Task-Oriented Dialogue with API Description and Example Programs
Figure 2 for Dialog2API: Task-Oriented Dialogue with API Description and Example Programs
Figure 3 for Dialog2API: Task-Oriented Dialogue with API Description and Example Programs
Figure 4 for Dialog2API: Task-Oriented Dialogue with API Description and Example Programs

Functionality and dialogue experience are two important factors of task-oriented dialogue systems. Conventional approaches with closed schema (e.g., conversational semantic parsing) often fail as both the functionality and dialogue experience are strongly constrained by the underlying schema. We introduce a new paradigm for task-oriented dialogue - Dialog2API - to greatly expand the functionality and provide seamless dialogue experience. The conversational model interacts with the environment by generating and executing programs triggering a set of pre-defined APIs. The model also manages the dialogue policy and interact with the user through generating appropriate natural language responses. By allowing generating free-form programs, Dialog2API supports composite goals by combining different APIs, whereas unrestricted program revision provides natural and robust dialogue experience. To facilitate Dialog2API, the core model is provided with API documents, an execution environment and optionally some example dialogues annotated with programs. We propose an approach tailored for the Dialog2API, where the dialogue states are represented by a stack of programs, with most recently mentioned program on the top of the stack. Dialog2API can work with many application scenarios such as software automation and customer service. In this paper, we construct a dataset for AWS S3 APIs and present evaluation results of in-context learning baselines.

Viaarxiv icon

Federated Semi-Supervised Learning with Prototypical Networks

May 30, 2022
Woojung Kim, Keondo Park, Kihyuk Sohn, Raphael Shu, Hyung-Sin Kim

Figure 1 for Federated Semi-Supervised Learning with Prototypical Networks
Figure 2 for Federated Semi-Supervised Learning with Prototypical Networks
Figure 3 for Federated Semi-Supervised Learning with Prototypical Networks
Figure 4 for Federated Semi-Supervised Learning with Prototypical Networks

With the increasing computing power of edge devices, Federated Learning (FL) emerges to enable model training without privacy concerns. The majority of existing studies assume the data are fully labeled on the client side. In practice, however, the amount of labeled data is often limited. Recently, federated semi-supervised learning (FSSL) is explored as a way to effectively utilize unlabeled data during training. In this work, we propose ProtoFSSL, a novel FSSL approach based on prototypical networks. In ProtoFSSL, clients share knowledge with each other via lightweight prototypes, which prevents the local models from diverging. For computing loss on unlabeled data, each client creates accurate pseudo-labels based on shared prototypes. Jointly with labeled data, the pseudo-labels provide training signals for local prototypes. Compared to a FSSL approach based on weight sharing, the prototype-based inter-client knowledge sharing significantly reduces both communication and computation costs, enabling more frequent knowledge sharing between more clients for better accuracy. In multiple datasets, ProtoFSSL results in higher accuracy compared to the recent FSSL methods with and without knowledge sharing, such as FixMatch, FedRGD, and FedMatch. On SVHN dataset, ProtoFSSL performs comparably to fully supervised FL methods.

Viaarxiv icon

Reward Optimization for Neural Machine Translation with Learned Metrics

Apr 15, 2021
Raphael Shu, Kang Min Yoo, Jung-Woo Ha

Figure 1 for Reward Optimization for Neural Machine Translation with Learned Metrics
Figure 2 for Reward Optimization for Neural Machine Translation with Learned Metrics
Figure 3 for Reward Optimization for Neural Machine Translation with Learned Metrics
Figure 4 for Reward Optimization for Neural Machine Translation with Learned Metrics

Neural machine translation (NMT) models are conventionally trained with token-level negative log-likelihood (NLL), which does not guarantee that the generated translations will be optimized for a selected sequence-level evaluation metric. Multiple approaches are proposed to train NMT with BLEU as the reward, in order to directly improve the metric. However, it was reported that the gain in BLEU does not translate to real quality improvement, limiting the application in industry. Recently, it became clear to the community that BLEU has a low correlation with human judgment when dealing with state-of-the-art models. This leads to the emerging of model-based evaluation metrics. These new metrics are shown to have a much higher human correlation. In this paper, we investigate whether it is beneficial to optimize NMT models with the state-of-the-art model-based metric, BLEURT. We propose a contrastive-margin loss for fast and stable reward optimization suitable for large NMT models. In experiments, we perform automatic and human evaluations to compare models trained with smoothed BLEU and BLEURT to the baseline models. Results show that the reward optimization with BLEURT is able to increase the metric scores by a large margin, in contrast to limited gain when training with smoothed BLEU. The human evaluation shows that models trained with BLEURT improve adequacy and coverage of translations. Code is available via https://github.com/naver-ai/MetricMT.

Viaarxiv icon

GraphPlan: Story Generation by Planning with Event Graph

Feb 05, 2021
Hong Chen, Raphael Shu, Hiroya Takamura, Hideki Nakayama

Figure 1 for GraphPlan: Story Generation by Planning with Event Graph
Figure 2 for GraphPlan: Story Generation by Planning with Event Graph
Figure 3 for GraphPlan: Story Generation by Planning with Event Graph
Figure 4 for GraphPlan: Story Generation by Planning with Event Graph

Story generation is a task that aims to automatically produce multiple sentences to make up a meaningful story. This task is challenging because it requires high-level understanding of semantic meaning of sentences and causality of story events. Naive sequence-to-sequence models generally fail to acquire such knowledge, as the logical correctness can hardly be guaranteed in a text generation model without the strategic planning. In this paper, we focus on planning a sequence of events assisted by event graphs, and use the events to guide the generator. Instead of using a sequence-to-sequence model to output a storyline as in some existing works, we propose to generate an event sequence by walking on an event graph. The event graphs are built automatically based on the corpus. To evaluate the proposed approach, we conduct human evaluation both on event planning and story generation. Based on large-scale human annotation results, our proposed approach is shown to produce more logically correct event sequences and stories.

Viaarxiv icon

Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation

Sep 15, 2020
Jason Lee, Raphael Shu, Kyunghyun Cho

Figure 1 for Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation
Figure 2 for Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation
Figure 3 for Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation
Figure 4 for Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation

We propose an efficient inference procedure for non-autoregressive machine translation that iteratively refines translation purely in the continuous space. Given a continuous latent variable model for machine translation (Shu et al., 2020), we train an inference network to approximate the gradient of the marginal log probability of the target sentence, using only the latent variable as input. This allows us to use gradient-based optimization to find the target sentence at inference time that approximately maximizes its marginal probability. As each refinement step only involves computation in the latent space of low dimensionality (we use 8 in our experiments), we avoid computational overhead incurred by existing non-autoregressive inference procedures that often refine in token space. We compare our approach to a recently proposed EM-like inference procedure (Shu et al., 2020) that optimizes in a hybrid space, consisting of both discrete and continuous variables. We evaluate our approach on WMT'14 En-De, WMT'16 Ro-En and IWSLT'16 De-En, and observe two advantages over the EM-like inference: (1) it is computationally efficient, i.e. each refinement step is twice as fast, and (2) it is more effective, resulting in higher marginal probabilities and BLEU scores with the same number of refinement steps. On WMT'14 En-De, for instance, our approach is able to decode 6.2 times faster than the autoregressive model with minimal degradation to translation quality (0.9 BLEU).

* Accepted to EMNLP 2020 
Viaarxiv icon

Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior

Sep 10, 2019
Raphael Shu, Jason Lee, Hideki Nakayama, Kyunghyun Cho

Figure 1 for Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior
Figure 2 for Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior
Figure 3 for Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior
Figure 4 for Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior

Although neural machine translation models reached high translation quality, the autoregressive nature makes inference difficult to parallelize and leads to high translation latency. Inspired by recent refinement-based approaches, we propose a latent-variable non-autoregressive model with continuous latent variables and deterministic inference procedure. In contrast to existing approaches, we use a deterministic iterative inference algorithm to find a target sequence that maximizes the lowerbound to the log-probability. During inference, the length of translation automatically adapts itself. Our experiments show that the lowerbound can be greatly increased by running the inference algorithm for only one step, resulting in significantly improved translation quality. Our proposed model closes the gap between non-autoregressive and autoregressive approaches on ASPEC Ja-En dataset with 7.8x faster decoding. On WMT'14 En-De dataset, our model narrows the performance gap with autoregressive baseline down to 2.0 BLEU points with 12.5x speedup.

Viaarxiv icon