Alert button
Picture for Yasheng Wang

Yasheng Wang

Alert button

DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering

Jul 13, 2023
Pei Ke, Fei Huang, Fei Mi, Yasheng Wang, Qun Liu, Xiaoyan Zhu, Minlie Huang

Figure 1 for DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
Figure 2 for DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
Figure 3 for DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
Figure 4 for DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering

Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability. Specifically, most of the well-performed metrics are required to train on evaluation datasets of specific NLG tasks and evaluation dimensions, which may cause over-fitting to task-specific datasets. Furthermore, existing metrics only provide an evaluation score for each dimension without revealing the evidence to interpret how this score is obtained. To deal with these challenges, we propose a simple yet effective metric called DecompEval. This metric formulates NLG evaluation as an instruction-style question answering task and utilizes instruction-tuned pre-trained language models (PLMs) without training on evaluation datasets, aiming to enhance the generalization ability. To make the evaluation process more interpretable, we decompose our devised instruction-style question about the quality of generated texts into the subquestions that measure the quality of each sentence. The subquestions with their answers generated by PLMs are then recomposed as evidence to obtain the evaluation result. Experimental results show that DecompEval achieves state-of-the-art performance in untrained metrics for evaluating text summarization and dialogue generation, which also exhibits strong dimension-level / task-level generalization ability and interpretability.

* Accepted by ACL 2023 (Main Conference) 
Viaarxiv icon

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

May 08, 2023
Zenan Xu, Xiaojun Meng, Yasheng Wang, Qinliang Su, Zexuan Qiu, Xin Jiang, Qun Liu

Figure 1 for Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video
Figure 2 for Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video
Figure 3 for Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video
Figure 4 for Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the success of the large-scale generative pre-trained language model (GPLM) in generating high-quality textual content (e.g., summary), recent MAS methods have proposed to adapt the GPLM to this task by equipping it with the visual information, which is often obtained through a general-purpose visual feature extractor. However, the generally extracted visual features may overlook some summary-worthy visual information, which impedes model performance. In this work, we propose a novel approach to learning the summary-worthy visual representation that facilitates abstractive summarization. Our method exploits the summary-worthy information from both the cross-modal transcript data and the knowledge that distills from the pseudo summary. Extensive experiments on three public multimodal datasets show that our method outperforms all competing baselines. Furthermore, with the advantages of summary-worthy visual information, our model can have a significant improvement on small datasets or even datasets with limited training data.

* Accepted by IJCAI-2023 
Viaarxiv icon

MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Constructing Moral Discussions

Dec 21, 2022
Hao Sun, Zhexin Zhang, Fei Mi, Yasheng Wang, Wei Liu, Jianwei Cui, Bin Wang, Qun Liu, Minlie Huang

Figure 1 for MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Constructing Moral Discussions
Figure 2 for MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Constructing Moral Discussions
Figure 3 for MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Constructing Moral Discussions
Figure 4 for MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Constructing Moral Discussions

Morality in dialogue systems has raised great attention in research recently. A moral dialogue system could better connect users and enhance conversation engagement by gaining users' trust. In this paper, we propose a framework, MoralDial to train and evaluate moral dialogue systems. In our framework, we first explore the communication mechanisms of morality and resolve expressed morality into four sub-modules. The sub-modules indicate the roadmap for building a moral dialogue system. Based on that, we design a simple yet effective method: constructing moral discussions from Rules of Thumb (RoTs) between simulated specific users and the dialogue system. The constructed discussion consists of expressing, explaining, and revising the moral views in dialogue exchanges, which makes conversational models learn morality well in a natural manner. Furthermore, we propose a novel evaluation method in the framework. We evaluate the multiple aspects of morality by judging the relation between dialogue responses and RoTs in discussions, where the multifaceted nature of morality is particularly considered. Automatic and manual experiments demonstrate that our framework is promising to train and evaluate moral dialogue systems.

Viaarxiv icon

MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion

Dec 19, 2022
Zi Gong, Yinpeng Guo, Pingyi Zhou, Cuiyun Gao, Yasheng Wang, Zenglin Xu

Figure 1 for MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion
Figure 2 for MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion
Figure 3 for MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion
Figure 4 for MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion

Code completion is a valuable topic in both academia and industry. Recently, large-scale mono-programming-lingual (MonoPL) pre-training models have been proposed to boost the performance of code completion. However, the code completion on low-resource programming languages (PL) is difficult for the data-driven paradigm, while there are plenty of developers using low-resource PLs. On the other hand, there are few studies exploring the effects of multi-programming-lingual (MultiPL) pre-training for the code completion, especially the impact on low-resource programming languages. To this end, we propose the MultiCoder to enhance the low-resource code completion via MultiPL pre-training and MultiPL Mixture-of-Experts (MoE) layers. We further propose a novel PL-level MoE routing strategy (PL-MoE) for improving the code completion on all PLs. Experimental results on CodeXGLUE and MultiCC demonstrate that 1) the proposed MultiCoder significantly outperforms the MonoPL baselines on low-resource programming languages, and 2) the PL-MoE module further boosts the performance on six programming languages. In addition, we analyze the effects of the proposed method in details and explore the effectiveness of our method in a variety of scenarios.

Viaarxiv icon

Momentum Contrastive Pre-training for Question Answering

Dec 12, 2022
Minda Hu, Muzhi Li, Yasheng Wang, Irwin King

Figure 1 for Momentum Contrastive Pre-training for Question Answering
Figure 2 for Momentum Contrastive Pre-training for Question Answering
Figure 3 for Momentum Contrastive Pre-training for Question Answering
Figure 4 for Momentum Contrastive Pre-training for Question Answering

Existing pre-training methods for extractive Question Answering (QA) generate cloze-like queries different from natural questions in syntax structure, which could overfit pre-trained models to simple keyword matching. In order to address this problem, we propose a novel Momentum Contrastive pRe-training fOr queStion anSwering (MCROSS) method for extractive QA. Specifically, MCROSS introduces a momentum contrastive learning framework to align the answer probability between cloze-like and natural query-passage sample pairs. Hence, the pre-trained models can better transfer the knowledge learned in cloze-like samples to answering natural questions. Experimental results on three benchmarking QA datasets show that our method achieves noticeable improvement compared with all baselines in both supervised and zero-shot scenarios.

* This has been accepted by EMNLP 2022. The reference to ACL Anthology will soon be added 
Viaarxiv icon

Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation

Dec 04, 2022
Zhexin Zhang, Jiale Cheng, Hao Sun, Jiawen Deng, Fei Mi, Yasheng Wang, Lifeng Shang, Minlie Huang

Figure 1 for Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Figure 2 for Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Figure 3 for Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Figure 4 for Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation

Large pretrained language models can easily produce toxic or biased content, which is prohibitive for practical use. In order to detect such toxic generations, existing methods rely on templates, real-world data extraction, crowdsourcing workers, or automatic generation to construct adversarial contexts that are likely to induce toxic generations. However, what type of context is more likely to induce unsafe responses is still under-explored. In this paper, we identify that context toxicity and context category (e.g., \textit{profanity}, \textit{insult}, \textit{drugs}, etc.) are two important factors to cause safety issues in response generation. Hence, we propose a method called \emph{reverse generation} to construct adversarial contexts conditioned on a given response, with the flexibility to control category, toxicity level, and inductivity of the generated contexts. Via reverse generation, we augment the existing BAD dataset and construct a new dataset BAD+ which contains more than 120K diverse and highly inductive contexts in 12 categories. We test three popular pretrained dialogue models (Blender, DialoGPT, and Plato2) and find that BAD+ can largely expose their safety problems. Furthermore, we show that BAD+ can greatly enhance the safety of generation and reveal the key factors of safety improvement. Our code and dataset is available at \url{https://github.com/thu-coai/Reverse_Generation}.

* Findings of EMNLP 2022 
Viaarxiv icon

KPT: Keyword-guided Pre-training for Grounded Dialog Generation

Dec 04, 2022
Qi Zhu, Fei Mi, Zheng Zhang, Yasheng Wang, Yitong Li, Xin Jiang, Qun Liu, Xiaoyan Zhu, Minlie Huang

Figure 1 for KPT: Keyword-guided Pre-training for Grounded Dialog Generation
Figure 2 for KPT: Keyword-guided Pre-training for Grounded Dialog Generation
Figure 3 for KPT: Keyword-guided Pre-training for Grounded Dialog Generation
Figure 4 for KPT: Keyword-guided Pre-training for Grounded Dialog Generation

Incorporating external knowledge into the response generation process is essential to building more helpful and reliable dialog agents. However, collecting knowledge-grounded conversations is often costly, calling for a better pre-trained model for grounded dialog generation that generalizes well w.r.t. different types of knowledge. In this work, we propose KPT (Keyword-guided Pre-Training), a novel self-supervised pre-training method for grounded dialog generation without relying on extra knowledge annotation. Specifically, we use a pre-trained language model to extract the most uncertain tokens in the dialog as keywords. With these keywords, we construct two kinds of knowledge and pre-train a knowledge-grounded response generation model, aiming at handling two different scenarios: (1) the knowledge should be faithfully grounded; (2) it can be selectively used. For the former, the grounding knowledge consists of keywords extracted from the response. For the latter, the grounding knowledge is additionally augmented with keywords extracted from other utterances in the same dialog. Since the knowledge is extracted from the dialog itself, KPT can be easily performed on a large volume and variety of dialogue data. We considered three data sources (open-domain, task-oriented, conversational QA) with a total of 2.5M dialogues. We conduct extensive experiments on various few-shot knowledge-grounded generation tasks, including grounding on dialog acts, knowledge graphs, persona descriptions, and Wikipedia passages. Our comprehensive experiments and analyses demonstrate that KPT consistently outperforms state-of-the-art methods on these tasks with diverse grounding knowledge.

* Accepted by AAAI 2023 
Viaarxiv icon

Lexicon-injected Semantic Parsing for Task-Oriented Dialog

Nov 26, 2022
Xiaojun Meng, Wenlin Dai, Yasheng Wang, Baojun Wang, Zhiyong Wu, Xin Jiang, Qun Liu

Figure 1 for Lexicon-injected Semantic Parsing for Task-Oriented Dialog
Figure 2 for Lexicon-injected Semantic Parsing for Task-Oriented Dialog
Figure 3 for Lexicon-injected Semantic Parsing for Task-Oriented Dialog
Figure 4 for Lexicon-injected Semantic Parsing for Task-Oriented Dialog

Recently, semantic parsing using hierarchical representations for dialog systems has captured substantial attention. Task-Oriented Parse (TOP), a tree representation with intents and slots as labels of nested tree nodes, has been proposed for parsing user utterances. Previous TOP parsing methods are limited on tackling unseen dynamic slot values (e.g., new songs and locations added), which is an urgent matter for real dialog systems. To mitigate this issue, we first propose a novel span-splitting representation for span-based parser that outperforms existing methods. Then we present a novel lexicon-injected semantic parser, which collects slot labels of tree representation as a lexicon, and injects lexical features to the span representation of parser. An additional slot disambiguation technique is involved to remove inappropriate span match occurrences from the lexicon. Our best parser produces a new state-of-the-art result (87.62%) on the TOP dataset, and demonstrates its adaptability to frequently updated slot lexicon entries in real task-oriented dialog, with no need of retraining.

Viaarxiv icon

PanGu-Coder: Program Synthesis with Function-Level Language Modeling

Jul 22, 2022
Fenia Christopoulou, Gerasimos Lampouras, Milan Gritta, Guchun Zhang, Yinpeng Guo, Zhongqi Li, Qi Zhang, Meng Xiao, Bo Shen, Lin Li, Hao Yu, Li Yan, Pingyi Zhou, Xin Wang, Yuchi Ma, Ignacio Iacobacci, Yasheng Wang, Guangtai Liang, Jiansheng Wei, Xin Jiang, Qianxiang Wang, Qun Liu

Figure 1 for PanGu-Coder: Program Synthesis with Function-Level Language Modeling
Figure 2 for PanGu-Coder: Program Synthesis with Function-Level Language Modeling
Figure 3 for PanGu-Coder: Program Synthesis with Function-Level Language Modeling
Figure 4 for PanGu-Coder: Program Synthesis with Function-Level Language Modeling

We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i.e. the synthesis of programming language solutions given a natural language problem description. We train PanGu-Coder using a two-stage strategy: the first stage employs Causal Language Modelling (CLM) to pre-train on raw programming language data, while the second stage uses a combination of Causal Language Modelling and Masked Language Modelling (MLM) training objectives that focus on the downstream task of text-to-code generation and train on loosely curated pairs of natural language program definitions and code functions. Finally, we discuss PanGu-Coder-FT, which is fine-tuned on a combination of competitive programming problems and code with continuous integration tests. We evaluate PanGu-Coder with a focus on whether it generates functionally correct programs and demonstrate that it achieves equivalent or better performance than similarly sized models, such as CodeX, while attending a smaller context window and training on less data.

* 27 pages 
Viaarxiv icon

Sparse Structure Search for Parameter-Efficient Tuning

Jun 15, 2022
Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan Liu, Maosong Sun

Figure 1 for Sparse Structure Search for Parameter-Efficient Tuning
Figure 2 for Sparse Structure Search for Parameter-Efficient Tuning
Figure 3 for Sparse Structure Search for Parameter-Efficient Tuning
Figure 4 for Sparse Structure Search for Parameter-Efficient Tuning

Adapting large pre-trained models (PTMs) through fine-tuning imposes prohibitive computational and storage burdens. Recent studies of parameter-efficient tuning (PET) find that only optimizing a small portion of parameters conditioned on PTMs could yield on-par performance compared to conventional fine-tuning. Generally, PET methods exquisitely design parameter-efficient modules (PET modules) which could be applied to arbitrary fine-grained positions inside PTMs. However, the effectiveness of these fine-grained positions largely relies on sophisticated manual designation, thereby usually producing sub-optimal results. In contrast to the manual designation, we explore constructing PET modules in an automatic manner. We automatically \textbf{S}earch for the \textbf{S}parse \textbf{S}tructure of \textbf{P}arameter-\textbf{E}fficient \textbf{T}uning (S$^3$PET). Based on a unified framework of various PET methods, S$^3$PET conducts the differentiable PET structure search through bi-level optimization and proposes shifted global sigmoid method to explicitly control the number of trainable parameters. Extensive experiments show that S$^3$PET surpasses manual and random structures with less trainable parameters. The searched structures preserve more than 99\% fine-tuning performance with 0.01\% trainable parameters. Moreover, the advantage of S$^3$PET is amplified with extremely low trainable parameters budgets (0.0009\%$\sim$0.01\%). The searched structures are transferable and explainable, providing suggestions and guidance for the future design of PET methods.

Viaarxiv icon