Alert button
Picture for Sanghwan Bae

Sanghwan Bae

Alert button

Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation

May 23, 2023
Takyoung Kim, Jamin Shin, Young-Ho Kim, Sanghwan Bae, Sungdong Kim

Figure 1 for Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation
Figure 2 for Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation
Figure 3 for Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation
Figure 4 for Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation

Most task-oriented dialogue (TOD) benchmarks assume users that know exactly how to use the system by constraining the user behaviors within the system's capabilities via strict user goals, namely "user familiarity" bias. This data bias deepens when it combines with data-driven TOD systems, as it is impossible to fathom the effect of it with existing static evaluations. Hence, we conduct an interactive user study to unveil how vulnerable TOD systems are against realistic scenarios. In particular, we compare users with 1) detailed goal instructions that conform to the system boundaries (closed-goal) and 2) vague goal instructions that are often unsupported but realistic (open-goal). Our study reveals that conversations in open-goal settings lead to catastrophic failures of the system, in which 92% of the dialogues had significant issues. Moreover, we conduct a thorough analysis to identify distinctive features between the two settings through error annotation. From this, we discover a novel "pretending" behavior, in which the system pretends to handle the user requests even though they are beyond the system's capabilities. We discuss its characteristics and toxicity while emphasizing transparency and a fallback strategy for robust TOD systems.

Viaarxiv icon

Aligning Large Language Models through Synthetic Feedback

May 23, 2023
Sungdong Kim, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak, Kang Min Yoo, Minjoon Seo

Figure 1 for Aligning Large Language Models through Synthetic Feedback
Figure 2 for Aligning Large Language Models through Synthetic Feedback
Figure 3 for Aligning Large Language Models through Synthetic Feedback
Figure 4 for Aligning Large Language Models through Synthetic Feedback

Aligning large language models (LLMs) to human values has become increasingly important as it enables sophisticated steering of LLMs, e.g., making them follow given instructions while keeping them less toxic. However, it requires a significant amount of human demonstrations and feedback. Recently, open-sourced models have attempted to replicate the alignment learning process by distilling data from already aligned LLMs like InstructGPT or ChatGPT. While this process reduces human efforts, constructing these datasets has a heavy dependency on the teacher models. In this work, we propose a novel framework for alignment learning with almost no human labor and no dependency on pre-aligned LLMs. First, we perform reward modeling (RM) with synthetic feedback by contrasting responses from vanilla LLMs with various sizes and prompts. Then, we use the RM for simulating high-quality demonstrations to train a supervised policy and for further optimizing the model with reinforcement learning. Our resulting model, Aligned Language Model with Synthetic Training dataset (ALMoST), outperforms open-sourced models, including Alpaca, Dolly, and OpenAssistant, which are trained on the outputs of InstructGPT or human-annotated instructions. Our 7B-sized model outperforms the 12-13B models in the A/B tests using GPT-4 as the judge with about 75% winning rate on average.

* Preprint, 9 pages (with 10 pages of supplementary) 
Viaarxiv icon

Keep Me Updated! Memory Management in Long-term Conversations

Oct 17, 2022
Sanghwan Bae, Donghyun Kwak, Soyoung Kang, Min Young Lee, Sungdong Kim, Yuin Jeong, Hyeri Kim, Sang-Woo Lee, Woomyoung Park, Nako Sung

Figure 1 for Keep Me Updated! Memory Management in Long-term Conversations
Figure 2 for Keep Me Updated! Memory Management in Long-term Conversations
Figure 3 for Keep Me Updated! Memory Management in Long-term Conversations
Figure 4 for Keep Me Updated! Memory Management in Long-term Conversations

Remembering important information from the past and continuing to talk about it in the present are crucial in long-term conversations. However, previous literature does not deal with cases where the memorized information is outdated, which may cause confusion in later conversations. To address this issue, we present a novel task and a corresponding dataset of memory management in long-term conversations, in which bots keep track of and bring up the latest information about users while conversing through multiple sessions. In order to support more precise and interpretable memory, we represent memory as unstructured text descriptions of key information and propose a new mechanism of memory management that selectively eliminates invalidated or redundant information. Experimental results show that our approach outperforms the baselines that leave the stored memory unchanged in terms of engagingness and humanness, with larger performance gap especially in the later sessions.

* Accepted to EMNLP2022 Findings 
Viaarxiv icon

Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models

Apr 30, 2022
Sanghwan Bae, Donghyun Kwak, Sungdong Kim, Donghoon Ham, Soyoung Kang, Sang-Woo Lee, Woomyoung Park

Figure 1 for Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models
Figure 2 for Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models
Figure 3 for Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models
Figure 4 for Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models

Recent open-domain dialogue models have brought numerous breakthroughs. However, building a chat system is not scalable since it often requires a considerable volume of human-human dialogue data, especially when enforcing features such as persona, style, or safety. In this work, we study the challenge of imposing roles on open-domain dialogue systems, with the goal of making the systems maintain consistent roles while conversing naturally with humans. To accomplish this, the system must satisfy a role specification that includes certain conditions on the stated features as well as a system policy on whether or not certain types of utterances are allowed. For this, we propose an efficient data collection framework leveraging in-context few-shot learning of large-scale language models for building role-satisfying dialogue dataset from scratch. We then compare various architectures for open-domain dialogue systems in terms of meeting role specifications while maintaining conversational abilities. Automatic and human evaluations show that our models return few out-of-bounds utterances, keeping competitive performance on general metrics. We release a Korean dialogue dataset we built for further research.

* Accepted to NAACL2022 as a long paper 
Viaarxiv icon

Summary Level Training of Sentence Rewriting for Abstractive Summarization

Sep 26, 2019
Sanghwan Bae, Taeuk Kim, Jihoon Kim, Sang-goo Lee

Figure 1 for Summary Level Training of Sentence Rewriting for Abstractive Summarization
Figure 2 for Summary Level Training of Sentence Rewriting for Abstractive Summarization
Figure 3 for Summary Level Training of Sentence Rewriting for Abstractive Summarization
Figure 4 for Summary Level Training of Sentence Rewriting for Abstractive Summarization

As an attempt to combine extractive and abstractive summarization, Sentence Rewriting models adopt the strategy of extracting salient sentences from a document first and then paraphrasing the selected ones to generate a summary. However, the existing models in this framework mostly rely on sentence-level rewards or suboptimal labels, causing a mismatch between a training objective and evaluation metric. In this paper, we present a novel training signal that directly maximizes summary-level ROUGE scores through reinforcement learning. In addition, we incorporate BERT into our model, making good use of its ability on natural language understanding. In extensive experiments, we show that a combination of our proposed model and training procedure obtains new state-of-the-art performance on both CNN/Daily Mail and New York Times datasets. We also demonstrate that it generalizes better on DUC-2002 test set.

* EMNLP 2019 Workshop on New Frontiers in Summarization 
Viaarxiv icon

SNU_IDS at SemEval-2019 Task 3: Addressing Training-Test Class Distribution Mismatch in Conversational Classification

Apr 01, 2019
Sanghwan Bae, Jihun Choi, Sang-goo Lee

Figure 1 for SNU_IDS at SemEval-2019 Task 3: Addressing Training-Test Class Distribution Mismatch in Conversational Classification
Figure 2 for SNU_IDS at SemEval-2019 Task 3: Addressing Training-Test Class Distribution Mismatch in Conversational Classification
Figure 3 for SNU_IDS at SemEval-2019 Task 3: Addressing Training-Test Class Distribution Mismatch in Conversational Classification
Figure 4 for SNU_IDS at SemEval-2019 Task 3: Addressing Training-Test Class Distribution Mismatch in Conversational Classification

We present several techniques to tackle the mismatch in class distributions between training and test data in the Contextual Emotion Detection task of SemEval 2019, by extending the existing methods for class imbalance problem. Reducing the distance between the distribution of prediction and ground truth, they consistently show positive effects on the performance. Also we propose a novel neural architecture which utilizes representation of overall context as well as of each utterance. The combination of the methods and the models achieved micro F1 score of about 0.766 on the final evaluation.

* International Workshop on Semantic Evaluation (SemEval 2019) 
Viaarxiv icon

Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations

Sep 07, 2018
Taeuk Kim, Jihun Choi, Daniel Edmiston, Sanghwan Bae, Sang-goo Lee

Figure 1 for Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations
Figure 2 for Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations
Figure 3 for Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations
Figure 4 for Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations

Most existing recursive neural network (RvNN) architectures utilize only the structure of parse trees, ignoring syntactic tags which are provided as by-products of parsing. We present a novel RvNN architecture that can provide dynamic compositionality by considering comprehensive syntactic information derived from both the structure and linguistic tags. Specifically, we introduce a structure-aware tag representation constructed by a separate tag-level tree-LSTM. With this, we can control the composition function of the existing word-level tree-LSTM by augmenting the representation as a supplementary input to the gate functions of the tree-LSTM. We show that models built upon the proposed architecture obtain superior performance on several sentence-level tasks such as sentiment analysis and natural language inference when compared against previous tree-structured models and other sophisticated neural models. In particular, our models achieve new state-of-the-art results on Stanford Sentiment Treebank, Movie Review, and Text Retrieval Conference datasets.

Viaarxiv icon