Alert button
Picture for Weizhou Shen

Weizhou Shen

Alert button

Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System

Oct 20, 2023
Weizhou Shen, Yingqi Gao, Canbin Huang, Fanqi Wan, Xiaojun Quan, Wei Bi

Figure 1 for Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System
Figure 2 for Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System
Figure 3 for Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System
Figure 4 for Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System

Developing an efficient retriever to retrieve knowledge from a large-scale knowledge base (KB) is critical for task-oriented dialogue systems to effectively handle localized and specialized tasks. However, widely used generative models such as T5 and ChatGPT often struggle to differentiate subtle differences among the retrieved KB records when generating responses, resulting in suboptimal quality of generated responses. In this paper, we propose the application of maximal marginal likelihood to train a perceptive retriever by utilizing signals from response generation for supervision. In addition, our approach goes beyond considering solely retrieved entities and incorporates various meta knowledge to guide the generator, thus improving the utilization of knowledge. We evaluate our approach on three task-oriented dialogue datasets using T5 and ChatGPT as the backbone models. The results demonstrate that when combined with meta knowledge, the response generator can effectively leverage high-quality knowledge records from the retriever and enhance the quality of generated responses. The codes and models of this paper are available at https://github.com/shenwzh3/MK-TOD.

* Accepted to EMNLP 2023 Main Conference 
Viaarxiv icon

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

Sep 02, 2023
Chenliang Li, Hehong Chen, Ming Yan, Weizhou Shen, Haiyang Xu, Zhikai Wu, Zhicheng Zhang, Wenmeng Zhou, Yingda Chen, Chen Cheng, Hongzhu Shi, Ji Zhang, Fei Huang, Jingren Zhou

Figure 1 for ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
Figure 2 for ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
Figure 3 for ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
Figure 4 for ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior. To further unleash the power of LLMs to accomplish complex tasks, there is a growing trend to build agent framework that equips LLMs, such as ChatGPT, with tool-use abilities to connect with massive external APIs. In this work, we introduce ModelScope-Agent, a general and customizable agent framework for real-world applications, based on open-source LLMs as controllers. It provides a user-friendly system library, with customizable engine design to support model training on multiple open-source LLMs, while also enabling seamless integration with both model APIs and common APIs in a unified way. To equip the LLMs with tool-use abilities, a comprehensive framework has been proposed spanning over tool-use data collection, tool retrieval, tool registration, memory control, customized model training, and evaluation for practical real-world applications. Finally, we showcase ModelScopeGPT, a real-world intelligent assistant of ModelScope Community based on the ModelScope-Agent framework, which is able to connect open-source LLMs with more than 1000 public AI models and localized community knowledge in ModelScope. The ModelScope-Agent library\footnote{https://github.com/modelscope/modelscope-agent} and online demo\footnote{https://modelscope.cn/studios/damo/ModelScopeGPT/summary} are now publicly available.

Viaarxiv icon

Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

May 17, 2023
Fanqi Wan, Weizhou Shen, Ke Yang, Xiaojun Quan, Wei Bi

Figure 1 for Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog
Figure 2 for Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog
Figure 3 for Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog
Figure 4 for Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses. Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses, leading to suboptimal retrieval performance when the knowledge base becomes large-scale. To address this, we propose to decouple knowledge retrieval from response generation and introduce a multi-grained knowledge retriever (MAKER) that includes an entity selector to search for relevant entities and an attribute selector to filter out irrelevant attributes. To train the retriever, we propose a novel distillation objective that derives supervision signals from the response generator. Experiments conducted on three standard benchmarks with both small and large-scale knowledge bases demonstrate that our retriever performs knowledge retrieval more effectively than existing methods. Our code has been made publicly available.\footnote{https://github.com/18907305772/MAKER}

* Accepted to ACL 2023 (Main Conference) 
Viaarxiv icon

Generic Dependency Modeling for Multi-Party Conversation

Feb 21, 2023
Weizhou Shen, Xiaojun Quan, Ke Yang

Figure 1 for Generic Dependency Modeling for Multi-Party Conversation
Figure 2 for Generic Dependency Modeling for Multi-Party Conversation
Figure 3 for Generic Dependency Modeling for Multi-Party Conversation
Figure 4 for Generic Dependency Modeling for Multi-Party Conversation

To model the dependencies between utterances in multi-party conversations, we propose a simple and generic framework based on the dependency parsing results of utterances. Particularly, we present an approach to encoding the dependencies in the form of relative dependency encoding (ReDE) and illustrate how to implement it in Transformers by modifying the computation of self-attention. Experimental results on four multi-party conversation benchmarks show that this framework successfully boosts the general performance of two Transformer-based language models and leads to comparable or even superior performance compared to the state-of-the-art methods. The codes are available at https://github.com/shenwzh3/ReDE.

* Accepted to ICASSP 2023 
Viaarxiv icon

Joint Generator-Ranker Learning for Natural Language Generation

Jun 28, 2022
Weizhou Shen, Yeyun Gong, Yelong Shen, Song Wang, Xiaojun Quan, Nan Duan, Weizhu Chen

Figure 1 for Joint Generator-Ranker Learning for Natural Language Generation
Figure 2 for Joint Generator-Ranker Learning for Natural Language Generation
Figure 3 for Joint Generator-Ranker Learning for Natural Language Generation
Figure 4 for Joint Generator-Ranker Learning for Natural Language Generation

Due to exposure bias, most existing natural language generation (NLG) models trained by maximizing the likelihood objective predict poor text results during the inference stage. In this paper, to tackle this problem, we revisit the generate-then-rank framework and propose a joint generator-ranker (JGR) training algorithm for text generation tasks. In JGR, the generator model is trained by maximizing two objectives: the likelihood of the training corpus and the expected reward given by the ranker model. Meanwhile, the ranker model takes input samples from the generator model and learns to distinguish good samples from the generation pool. The generator and ranker models are alternately optimized till convergence. In the empirical study, the proposed JGR model achieves new state-of-the-art performance on five public benchmarks covering three popular generation tasks: summarization, question generation, and response generation. We will make code, data, and models available at https://github.com/microsoft/AdvNLG.

* In progress 
Viaarxiv icon

Directed Acyclic Graph Network for Conversational Emotion Recognition

May 27, 2021
Weizhou Shen, Siyue Wu, Yunyi Yang, Xiaojun Quan

Figure 1 for Directed Acyclic Graph Network for Conversational Emotion Recognition
Figure 2 for Directed Acyclic Graph Network for Conversational Emotion Recognition
Figure 3 for Directed Acyclic Graph Network for Conversational Emotion Recognition
Figure 4 for Directed Acyclic Graph Network for Conversational Emotion Recognition

The modeling of conversational context plays a vital role in emotion recognition from conversation (ERC). In this paper, we put forward a novel idea of encoding the utterances with a directed acyclic graph (DAG) to better model the intrinsic structure within a conversation, and design a directed acyclic neural network,~namely DAG-ERC, to implement this idea.~In an attempt to combine the strengths of conventional graph-based neural models and recurrence-based neural models,~DAG-ERC provides a more intuitive way to model the information flow between long-distance conversation background and nearby context.~Extensive experiments are conducted on four ERC benchmarks with state-of-the-art models employed as baselines for comparison.~The empirical results demonstrate the superiority of this new model and confirm the motivation of the directed acyclic graph architecture for ERC.

* ACL 2021 main conference 
Viaarxiv icon

DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition

Dec 16, 2020
Weizhou Shen, Junqing Chen, Xiaojun Quan, Zhixian Xie

Figure 1 for DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition
Figure 2 for DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition
Figure 3 for DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition
Figure 4 for DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition

This paper presents our pioneering effort for emotion recognition in conversation (ERC) with pre-trained language models. Unlike regular documents, conversational utterances appear alternately from different parties and are usually organized as hierarchical structures in previous work. Such structures are not conducive to the application of pre-trained language models such as XLNet. To address this issue, we propose an all-in-one XLNet model, namely DialogXL, with enhanced memory to store longer historical context and dialog-aware self-attention to deal with the multi-party structures. Specifically, we first modify the recurrence mechanism of XLNet from segment-level to utterance-level in order to better model the conversational data. Second, we introduce dialog-aware self-attention in replacement of the vanilla self-attention in XLNet to capture useful intra- and inter-speaker dependencies. Extensive experiments are conducted on four ERC benchmarks with mainstream models presented for comparison. The experimental results show that the proposed model outperforms the baselines on all the datasets. Several other experiments such as ablation study and error analysis are also conducted and the results confirm the role of the critical modules of DialogXL.

* Accepted by AAAI 2021 main conference 
Viaarxiv icon

Relational Graph Attention Network for Aspect-based Sentiment Analysis

Apr 26, 2020
Kai Wang, Weizhou Shen, Yunyi Yang, Xiaojun Quan, Rui Wang

Figure 1 for Relational Graph Attention Network for Aspect-based Sentiment Analysis
Figure 2 for Relational Graph Attention Network for Aspect-based Sentiment Analysis
Figure 3 for Relational Graph Attention Network for Aspect-based Sentiment Analysis
Figure 4 for Relational Graph Attention Network for Aspect-based Sentiment Analysis

Aspect-based sentiment analysis aims to determine the sentiment polarity towards a specific aspect in online reviews. Most recent efforts adopt attention-based neural network models to implicitly connect aspects with opinion words. However, due to the complexity of language and the existence of multiple aspects in a single sentence, these models often confuse the connections. In this paper, we address this problem by means of effective encoding of syntax information. Firstly, we define a unified aspect-oriented dependency tree structure rooted at a target aspect by reshaping and pruning an ordinary dependency parse tree. Then, we propose a relational graph attention network (R-GAT) to encode the new tree structure for sentiment prediction. Extensive experiments are conducted on the SemEval 2014 and Twitter datasets, and the experimental results confirm that the connections between aspects and opinion words can be better established with our approach, and the performance of the graph attention network (GAT) is significantly improved as a consequence.

* To appear at ACL 2020 
Viaarxiv icon