Alert button
Picture for Hongru Wang

Hongru Wang

Alert button

A Survey of the Evolution of Language Model-Based Dialogue Systems

Nov 28, 2023
Hongru Wang, Lingzhi Wang, Yiming Du, Liang Chen, Jingyan Zhou, Yufei Wang, Kam-Fai Wong

Dialogue systems, including task-oriented_dialogue_system (TOD) and open-domain_dialogue_system (ODD), have undergone significant transformations, with language_models (LM) playing a central role. This survey delves into the historical trajectory of dialogue systems, elucidating their intricate relationship with advancements in language models by categorizing this evolution into four distinct stages, each marked by pivotal LM breakthroughs: 1) Early_Stage: characterized by statistical LMs, resulting in rule-based or machine-learning-driven dialogue_systems; 2) Independent development of TOD and ODD based on neural_language_models (NLM; e.g., LSTM and GRU), since NLMs lack intrinsic knowledge in their parameters; 3) fusion between different types of dialogue systems with the advert of pre-trained_language_models (PLMs), starting from the fusion between four_sub-tasks_within_TOD, and then TOD_with_ODD; and 4) current LLM-based_dialogue_system, wherein LLMs can be used to conduct TOD and ODD seamlessly. Thus, our survey provides a chronological perspective aligned with LM breakthroughs, offering a comprehensive review of state-of-the-art research outcomes. What's more, we focus on emerging topics and discuss open challenges, providing valuable insights into future directions for LLM-based_dialogue_systems. Through this exploration, we pave the way for a deeper_comprehension of the evolution, guiding future developments in LM-based dialogue_systems.

Viaarxiv icon

Self-Guard: Empower the LLM to Safeguard Itself

Oct 24, 2023
Zezhong Wang, Fangkai Yang, Lu Wang, Pu Zhao, Hongru Wang, Liang Chen, Qingwei Lin, Kam-Fai Wong

The jailbreak attack can bypass the safety measures of a Large Language Model (LLM), generating harmful content. This misuse of LLM has led to negative societal consequences. Currently, there are two main approaches to address jailbreak attacks: safety training and safeguards. Safety training focuses on further training LLM to enhance its safety. On the other hand, safeguards involve implementing external models or filters to prevent harmful outputs. However, safety training has constraints in its ability to adapt to new attack types and often leads to a drop in model performance. Safeguards have proven to be of limited help. To tackle these issues, we propose a novel approach called Self-Guard, which combines the strengths of both safety methods. Self-Guard includes two stages. In the first stage, we enhance the model's ability to assess harmful content, and in the second stage, we instruct the model to consistently perform harmful content detection on its own responses. The experiment has demonstrated that Self-Guard is robust against jailbreak attacks. In the bad case analysis, we find that LLM occasionally provides harmless responses to harmful queries. Additionally, we evaluated the general capabilities of the LLM before and after safety training, providing evidence that Self-Guard does not result in the LLM's performance degradation. In sensitivity tests, Self-Guard not only avoids inducing over-sensitivity in LLM but also can even mitigate this issue.

Viaarxiv icon

Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment

Oct 16, 2023
Boyang Xue, Weichao Wang, Hongru Wang, Fei Mi, Rui Wang, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong

Figure 1 for Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment
Figure 2 for Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment
Figure 3 for Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment
Figure 4 for Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment

Pretrained language models (PLMs) based knowledge-grounded dialogue systems are prone to generate responses that are factually inconsistent with the provided knowledge source. In such inconsistent responses, the dialogue models fail to accurately express the external knowledge they rely upon. Inspired by previous work which identified that feed-forward networks (FFNs) within Transformers are responsible for factual knowledge expressions, we investigate two methods to efficiently improve the factual expression capability {of FFNs} by knowledge enhancement and alignment respectively. We first propose \textsc{K-Dial}, which {explicitly} introduces {extended FFNs in Transformers to enhance factual knowledge expressions} given the specific patterns of knowledge-grounded dialogue inputs. Additionally, we apply the reinforcement learning for factual consistency (RLFC) method to implicitly adjust FFNs' expressions in responses by aligning with gold knowledge for the factual consistency preference. To comprehensively assess the factual consistency and dialogue quality of responses, we employ extensive automatic measures and human evaluations including sophisticated fine-grained NLI-based metrics. Experimental results on WoW and CMU\_DoG datasets demonstrate that our methods efficiently enhance the ability of the FFN module to convey factual knowledge, validating the efficacy of improving factual consistency for knowledge-grounded dialogue systems.

Viaarxiv icon

Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogue

Oct 13, 2023
Hongru Wang, Minda Hu, Yang Deng, Rui Wang, Fei Mi, Weichao Wang, Yasheng Wang, Wai-Chung Kwan, Irwin King, Kam-Fai Wong

Figure 1 for Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogue
Figure 2 for Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogue
Figure 3 for Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogue
Figure 4 for Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogue

Open-domain dialogue system usually requires different sources of knowledge to generate more informative and evidential responses. However, existing knowledge-grounded dialogue systems either focus on a single knowledge source or overlook the dependency between multiple sources of knowledge, which may result in generating inconsistent or even paradoxical responses. To incorporate multiple knowledge sources and dependencies between them, we propose SAFARI, a novel framework that leverages the exceptional capabilities of large language models (LLMs) in planning, understanding, and incorporating under both supervised and unsupervised settings. Specifically, SAFARI decouples the knowledge grounding into multiple sources and response generation, which allows easy extension to various knowledge sources including the possibility of not using any sources. To study the problem, we construct a personalized knowledge-grounded dialogue dataset \textit{\textbf{K}nowledge \textbf{B}ehind \textbf{P}ersona}~(\textbf{KBP}), which is the first to consider the dependency between persona and implicit knowledge. Experimental results on the KBP dataset demonstrate that the SAFARI framework can effectively produce persona-consistent and knowledge-enhanced responses.

Viaarxiv icon

TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration

Sep 28, 2023
Hongru Wang, Huimin Wang, Lingzhi Wang, Minda Hu, Rui Wang, Boyang Xue, Hongyuan Lu, Fei Mi, Kam-Fai Wong

Figure 1 for TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration
Figure 2 for TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration
Figure 3 for TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration
Figure 4 for TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration

Large language models (LLMs) have demonstrated exceptional performance in planning the use of various functional tools, such as calculators and retrievers, particularly in question-answering tasks. In this paper, we expand the definition of these tools, centering on conceptual tools within the context of dialogue systems. A conceptual tool specifies a cognitive concept that aids systematic or investigative thought. These conceptual tools play important roles in practice, such as multiple psychological or tutoring strategies being dynamically applied in a single turn to compose helpful responses. To further enhance the reasoning and planning capability of LLMs with these conceptual tools, we introduce a multi-persona collaboration framework: Think-Plan-Execute (TPE). This framework decouples the response generation process into three distinct roles: Thinker, Planner, and Executor. Specifically, the Thinker analyzes the internal status exhibited in the dialogue context, such as user emotions and preferences, to formulate a global guideline. The Planner then generates executable plans to call different conceptual tools (e.g., sources or strategies), while the Executor compiles all intermediate results into a coherent response. This structured approach not only enhances the explainability and controllability of responses but also reduces token redundancy. We demonstrate the effectiveness of TPE across various dialogue response generation tasks, including multi-source (FoCus) and multi-strategy interactions (CIMA and PsyQA). This reveals its potential to handle real-world dialogue interactions that require more complicated tool learning beyond just functional tools. The full code and data will be released for reproduction.

Viaarxiv icon

JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning

Sep 01, 2023
Wai-Chung Kwan, Huimin Wang, Hongru Wang, Zezhong Wang, Xian Wu, Yefeng Zheng, Kam-Fai Wong

Figure 1 for JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning
Figure 2 for JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning
Figure 3 for JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning
Figure 4 for JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning

Dialogue policy learning (DPL) is a crucial component of dialogue modelling. Its primary role is to determine the appropriate abstract response, commonly referred to as the "dialogue action". Traditional DPL methodologies have treated this as a sequential decision problem, using pre-defined action candidates extracted from a corpus. However, these incomplete candidates can significantly limit the diversity of responses and pose challenges when dealing with edge cases, which are scenarios that occur only at extreme operating parameters. To address these limitations, we introduce a novel framework, JoTR. This framework is unique as it leverages a text-to-text Transformer-based model to generate flexible dialogue actions. Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation, without the need for any action templates. This setting enhances the diversity of responses and improves the system's ability to handle edge cases effectively. In addition, JoTR employs reinforcement learning with a reward-shaping mechanism to efficiently finetune the word-level dialogue policy, which allows the model to learn from its interactions, improving its performance over time. We conducted an extensive evaluation of JoTR to assess its effectiveness. Our extensive evaluation shows that JoTR achieves state-of-the-art performance on two benchmark dialogue modelling tasks, as assessed by both user simulators and human evaluators.

* Our code, models and other related resources are publicly available at https://github.com/KwanWaiChung/JoTR 
Viaarxiv icon

Self-Critique Prompting with Large Language Models for Inductive Instructions

May 23, 2023
Rui Wang, Hongru Wang, Fei Mi, Yi Chen, Ruifeng Xu, Kam-Fai Wong

Figure 1 for Self-Critique Prompting with Large Language Models for Inductive Instructions
Figure 2 for Self-Critique Prompting with Large Language Models for Inductive Instructions
Figure 3 for Self-Critique Prompting with Large Language Models for Inductive Instructions
Figure 4 for Self-Critique Prompting with Large Language Models for Inductive Instructions

Numerous works are proposed to improve or evaluate the capabilities of Large language models (LLMs) to fulfill user instructions. However, they neglect the possibility that user inputs may inherently contain incorrect information due to users' false beliefs or malicious intents. In this way, blindly adhering to users' false content will cause deception and harm. To address this problem, we propose a challenging benchmark consisting of Inductive Instructions (INDust) to evaluate whether LLMs could resist these instructions. The INDust includes 15K instructions across three categories: Fact-Checking Instructions, Questions based on False Premises, and Creative Instructions based on False Premises. Our experiments on several strong LLMs reveal that current LLMs can be easily deceived by INDust into generating misleading and malicious statements. Hence we employ Self-Critique prompting to encourage LLMs to not only critique themselves like in previous works but also the users, which show remarkable improvement in handling inductive instructions under both zero-shot and few-shot settings.

Viaarxiv icon

Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization

May 22, 2023
Liang Chen, Hongru Wang, Yang Deng, Wai-Chung Kwan, Zezhong Wang, Kam-Fai Wong

Figure 1 for Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization
Figure 2 for Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization
Figure 3 for Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization
Figure 4 for Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization

Generating persona consistent dialogue response is important for developing an intelligent conversational agent. Recent works typically fine-tune large-scale pre-trained models on this task by concatenating persona texts and dialogue history as a single input sequence to generate the target response. While simple and effective, our analysis shows that this popular practice is seriously affected by order sensitivity where different input orders of persona sentences significantly impact the quality and consistency of generated response, resulting in severe performance fluctuations (i.e., 29.4% on GPT2 and 83.2% on BART). To mitigate the order sensitivity problem, we propose a model-agnostic framework, ORder Insensitive Generation (ORIG), which enables dialogue models to learn robust representation under different persona orders and improve the consistency of response generation. Experiments on the Persona-Chat dataset justify the effectiveness and superiority of our method with two dominant pre-trained models (GPT2 and BART).

* ACL 2023 
Viaarxiv icon

Chain-of-thought prompting for responding to in-depth dialogue questions with LLM

May 19, 2023
Hongru Wang, Rui Wang, Fei Mi, Zezhong Wang, Ruifeng Xu, Kam-Fai Wong

Figure 1 for Chain-of-thought prompting for responding to in-depth dialogue questions with LLM
Figure 2 for Chain-of-thought prompting for responding to in-depth dialogue questions with LLM
Figure 3 for Chain-of-thought prompting for responding to in-depth dialogue questions with LLM
Figure 4 for Chain-of-thought prompting for responding to in-depth dialogue questions with LLM

The way and content in which users ask questions can provide insight into their current status, including their personality, emotions, and psychology. Instead of directly prompting the large language models (LLMs), we explore how chain-of-thought prompting helps in this scenario to perform reasoning and planning according to user status, aiming to provide a more personalized and engaging experience for the user query. To this end, we first construct a benchmark of 6 dialogue or question-answering datasets in both English and Chinese, covering 3 different aspects of user status (\textit{including} \textit{personality}, \textit{emotion}, and \textit{psychology}). Then we prompt the LLMs to generate the response regarding the user status as intermediate reasoning processing. We propose a novel demonstration selection strategy using the semantic similarity of intermediate reasoning instead of test queries. To evaluate the effectiveness and robustness of our approach, we conduct extensive experiments with 7 LLMs under zero-shot and one-shot settings. The experimental results show that our approach consistently outperforms standard prompting in terms of both \textit{helpfulness} and \textit{acceptness} across all datasets, regardless of the LLMs used. The code and dataset can be found at \url{https://github.com/ruleGreen/Dialogue\_CoT.git}.

Viaarxiv icon

DIGAT: Modeling News Recommendation with Dual-Graph Interaction

Oct 11, 2022
Zhiming Mao, Jian Li, Hongru Wang, Xingshan Zeng, Kam-Fai Wong

Figure 1 for DIGAT: Modeling News Recommendation with Dual-Graph Interaction
Figure 2 for DIGAT: Modeling News Recommendation with Dual-Graph Interaction
Figure 3 for DIGAT: Modeling News Recommendation with Dual-Graph Interaction
Figure 4 for DIGAT: Modeling News Recommendation with Dual-Graph Interaction

News recommendation (NR) is essential for online news services. Existing NR methods typically adopt a news-user representation learning framework, facing two potential limitations. First, in news encoder, single candidate news encoding suffers from an insufficient semantic information problem. Second, existing graph-based NR methods are promising but lack effective news-user feature interaction, rendering the graph-based recommendation suboptimal. To overcome these limitations, we propose dual-interactive graph attention networks (DIGAT) consisting of news- and user-graph channels. In the news-graph channel, we enrich the semantics of single candidate news by incorporating the semantically relevant news information with a semantic-augmented graph (SAG). In the user-graph channel, multi-level user interests are represented with a news-topic graph. Most notably, we design a dual-graph interaction process to perform effective feature interaction between the news and user graphs, which facilitates accurate news-user representation matching. Experiment results on the benchmark dataset MIND show that DIGAT outperforms existing news recommendation methods. Further ablation studies and analyses validate the effectiveness of (1) semantic-augmented news graph modeling and (2) dual-graph interaction.

* Findings of EMNLP 2022. This paper was first submitted to ARR 2021 November (https://openreview.net/forum?id=t2vXlG7Oe5m) 
Viaarxiv icon