Alert button
Picture for Dejiao Zhang

Dejiao Zhang

Alert button

Exploring Continual Learning for Code Generation Models

Jul 05, 2023
Prateek Yadav, Qing Sun, Hantian Ding, Xiaopeng Li, Dejiao Zhang, Ming Tan, Xiaofei Ma, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Mohit Bansal, Bing Xiang

Figure 1 for Exploring Continual Learning for Code Generation Models
Figure 2 for Exploring Continual Learning for Code Generation Models
Figure 3 for Exploring Continual Learning for Code Generation Models
Figure 4 for Exploring Continual Learning for Code Generation Models

Large-scale code generation models such as Codex and CodeT5 have achieved impressive performance. However, libraries are upgraded or deprecated very frequently and re-training large-scale language models is computationally expensive. Therefore, Continual Learning (CL) is an important aspect that remains underexplored in the code domain. In this paper, we introduce a benchmark called CodeTask-CL that covers a wide range of tasks, including code generation, translation, summarization, and refinement, with different input and output programming languages. Next, on our CodeTask-CL benchmark, we compare popular CL techniques from NLP and Vision domains. We find that effective methods like Prompt Pooling (PP) suffer from catastrophic forgetting due to the unstable training of the prompt selection mechanism caused by stark distribution shifts in coding tasks. We address this issue with our proposed method, Prompt Pooling with Teacher Forcing (PP-TF), that stabilizes training by enforcing constraints on the prompt selection mechanism and leads to a 21.54% improvement over Prompt Pooling. Along with the benchmark, we establish a training pipeline that can be used for CL on code models, which we believe can motivate further development of CL methods for code models. Our code is available at https://github.com/amazon-science/codetaskcl-pptf

* ACL 2023 
Viaarxiv icon

ContraGen: Effective Contrastive Learning For Causal Language Model

Oct 03, 2022
Nihal Jain, Dejiao Zhang, Wasi Uddin Ahmad, Zijian Wang, Feng Nan, Xiaopeng Li, Ming Tan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Xiaofei Ma, Bing Xiang

Figure 1 for ContraGen: Effective Contrastive Learning For Causal Language Model
Figure 2 for ContraGen: Effective Contrastive Learning For Causal Language Model
Figure 3 for ContraGen: Effective Contrastive Learning For Causal Language Model
Figure 4 for ContraGen: Effective Contrastive Learning For Causal Language Model

Despite exciting progress in large-scale language generation, the expressiveness of its representations is severely limited by the \textit{anisotropy} issue where the hidden representations are distributed into a narrow cone in the vector space. To address this issue, we present ContraGen, a novel contrastive learning framework to improve the representation with better uniformity and discrimination. We assess ContraGen on a wide range of downstream tasks in natural and programming languages. We show that ContraGen can effectively enhance both uniformity and discrimination of the representations and lead to the desired improvement on various language understanding tasks where discriminative representations are crucial for attaining good performance. Specifically, we attain $44\%$ relative improvement on the Semantic Textual Similarity tasks and $34\%$ on Code-to-Code Search tasks. Furthermore, by improving the expressiveness of the representations, ContraGen also boosts the source code generation capability with $9\%$ relative improvement on execution accuracy on the HumanEval benchmark.

* 10 pages 
Viaarxiv icon

Learning Dialogue Representations from Consecutive Utterances

May 26, 2022
Zhihan Zhou, Dejiao Zhang, Wei Xiao, Nicholas Dingwall, Xiaofei Ma, Andrew O. Arnold, Bing Xiang

Figure 1 for Learning Dialogue Representations from Consecutive Utterances
Figure 2 for Learning Dialogue Representations from Consecutive Utterances
Figure 3 for Learning Dialogue Representations from Consecutive Utterances
Figure 4 for Learning Dialogue Representations from Consecutive Utterances

Learning high-quality dialogue representations is essential for solving a variety of dialogue-oriented tasks, especially considering that dialogue systems often suffer from data scarcity. In this paper, we introduce Dialogue Sentence Embedding (DSE), a self-supervised contrastive learning method that learns effective dialogue representations suitable for a wide range of dialogue tasks. DSE learns from dialogues by taking consecutive utterances of the same dialogue as positive pairs for contrastive learning. Despite its simplicity, DSE achieves significantly better representation capability than other dialogue representation and universal sentence representation models. We evaluate DSE on five downstream dialogue tasks that examine dialogue representation at different semantic granularities. Experiments in few-shot and zero-shot settings show that DSE outperforms baselines by a large margin. For example, it achieves 13 average performance improvement over the strongest unsupervised baseline in 1-shot intent classification on 6 datasets. We also provide analyses on the benefits and limitations of our model.

* NAACL 2022 main conference 
Viaarxiv icon

QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition

Mar 04, 2022
Andy T. Liu, Wei Xiao, Henghui Zhu, Dejiao Zhang, Shang-Wen Li, Andrew Arnold

Figure 1 for QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition
Figure 2 for QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition
Figure 3 for QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition
Figure 4 for QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition

Recently, prompt-based learning for pre-trained language models has succeeded in few-shot Named Entity Recognition (NER) by exploiting prompts as task guidance to increase label efficiency. However, previous prompt-based methods for few-shot NER have limitations such as a higher computational complexity, poor zero-shot ability, requiring manual prompt engineering, or lack of prompt robustness. In this work, we address these shortcomings by proposing a new prompt-based learning NER method with Question Answering (QA), called QaNER. Our approach includes 1) a refined strategy for converting NER problems into the QA formulation; 2) NER prompt generation for QA models; 3) prompt-based tuning with QA models on a few annotated NER examples; 4) zero-shot NER by prompting the QA model. Comparing the proposed approach with previous methods, QaNER is faster at inference, insensitive to the prompt quality, and robust to hyper-parameters, as well as demonstrating significantly better low-resource performance and zero-shot capability.

* 8 pages, 6 figures 
Viaarxiv icon

Virtual Augmentation Supported Contrastive Learning of Sentence Representations

Oct 16, 2021
Dejiao Zhang, Wei Xiao, Henghui Zhu, Xiaofei Ma, Andrew O. Arnold

Figure 1 for Virtual Augmentation Supported Contrastive Learning of Sentence Representations
Figure 2 for Virtual Augmentation Supported Contrastive Learning of Sentence Representations
Figure 3 for Virtual Augmentation Supported Contrastive Learning of Sentence Representations
Figure 4 for Virtual Augmentation Supported Contrastive Learning of Sentence Representations

Despite profound successes, contrastive representation learning relies on carefully designed data augmentations using domain specific knowledge. This challenge is magnified in natural language processing where no general rules exist for data augmentation due to the discrete nature of natural language. We tackle this challenge by presenting a Virtual augmentation Supported Contrastive Learning of sentence representations (VaSCL). Originating from the interpretation that data augmentation essentially constructs the neighborhoods of each training instance, we in turn utilize the neighborhood to generate effective data augmentations. Leveraging the large training batch size of contrastive learning, we approximate the neighborhood of an instance via its K-nearest in-batch neighbors in the representation space. We then define an instance discrimination task within this neighborhood, and generate the virtual augmentation in an adversarial training manner. We access the performance of VaSCL on a wide range of downstream tasks, and set a new state-of-the-art for unsupervised sentence representation learning.

* 8 pages, 3 figures, 3 tables 
Viaarxiv icon

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Oct 16, 2021
Xisen Jin, Dejiao Zhang, Henghui Zhu, Wei Xiao, Shang-Wen Li, Xiaokai Wei, Andrew Arnold, Xiang Ren

Figure 1 for Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora
Figure 2 for Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora
Figure 3 for Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora
Figure 4 for Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fine-tuned for various downstream tasks. However, when deployed in the real world, a PTLM-based model must deal with data from a new domain that deviates from what the PTLM was initially trained on, or newly emerged data that contains out-of-distribution information. In this paper, we study a lifelong language model pretraining challenge where a PTLM is continually updated so as to adapt to emerging data. Over a domain-incremental research paper stream and a chronologically ordered tweet stream, we incrementally pretrain a PTLM with different continual learning algorithms, and keep track of the downstream task performance (after fine-tuning) to analyze its ability of acquiring new knowledge and preserving learned knowledge. Our experiments show continual learning algorithms improve knowledge preservation, with logit distillation being the most effective approach. We further show that continual pretraining improves generalization when training and testing data of downstream tasks are drawn from different time steps, but do not improve when they are from the same time steps. We believe our problem formulation, methods, and analysis will inspire future studies towards continual pretraining of language models.

* 8 pages 
Viaarxiv icon

Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey

Oct 16, 2021
Xiaokai Wei, Shen Wang, Dejiao Zhang, Parminder Bhatia, Andrew Arnold

Figure 1 for Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey
Figure 2 for Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey
Figure 3 for Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey

Pretrained Language Models (PLM) have established a new paradigm through learning informative contextualized representations on large-scale text corpus. This new paradigm has revolutionized the entire field of natural language processing, and set the new state-of-the-art performance for a wide variety of NLP tasks. However, though PLMs could store certain knowledge/facts from training corpus, their knowledge awareness is still far from satisfactory. To address this issue, integrating knowledge into PLMs have recently become a very active research area and a variety of approaches have been developed. In this paper, we provide a comprehensive survey of the literature on this emerging and fast-growing field - Knowledge Enhanced Pretrained Language Models (KE-PLMs). We introduce three taxonomies to categorize existing work. Besides, we also survey the various NLU and NLG applications on which KE-PLM has demonstrated superior performance over vanilla PLMs. Finally, we discuss challenges that face KE-PLMs and also promising directions for future research.

Viaarxiv icon

Pairwise Supervised Contrastive Learning of Sentence Representations

Sep 12, 2021
Dejiao Zhang, Shang-Wen Li, Wei Xiao, Henghui Zhu, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang

Figure 1 for Pairwise Supervised Contrastive Learning of Sentence Representations
Figure 2 for Pairwise Supervised Contrastive Learning of Sentence Representations
Figure 3 for Pairwise Supervised Contrastive Learning of Sentence Representations
Figure 4 for Pairwise Supervised Contrastive Learning of Sentence Representations

Many recent successes in sentence representation learning have been achieved by simply fine-tuning on the Natural Language Inference (NLI) datasets with triplet loss or siamese loss. Nevertheless, they share a common weakness: sentences in a contradiction pair are not necessarily from different semantic categories. Therefore, optimizing the semantic entailment and contradiction reasoning objective alone is inadequate to capture the high-level semantic structure. The drawback is compounded by the fact that the vanilla siamese or triplet losses only learn from individual sentence pairs or triplets, which often suffer from bad local optima. In this paper, we propose PairSupCon, an instance discrimination based approach aiming to bridge semantic entailment and contradiction understanding with high-level categorical concept encoding. We evaluate PairSupCon on various downstream tasks that involve understanding sentence semantics at different granularities. We outperform the previous state-of-the-art method with $10\%$--$13\%$ averaged improvement on eight clustering tasks, and $5\%$--$6\%$ averaged improvement on seven semantic textual similarity (STS) tasks.

* 9 pages, EMNLP 2021 
Viaarxiv icon

Improving Factual Consistency of Abstractive Summarization via Question Answering

May 10, 2021
Feng Nan, Cicero Nogueira dos Santos, Henghui Zhu, Patrick Ng, Kathleen McKeown, Ramesh Nallapati, Dejiao Zhang, Zhiguo Wang, Andrew O. Arnold, Bing Xiang

Figure 1 for Improving Factual Consistency of Abstractive Summarization via Question Answering
Figure 2 for Improving Factual Consistency of Abstractive Summarization via Question Answering
Figure 3 for Improving Factual Consistency of Abstractive Summarization via Question Answering
Figure 4 for Improving Factual Consistency of Abstractive Summarization via Question Answering

A commonly observed problem with the state-of-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents. The fact that automatic summarization may produce plausible-sounding yet inaccurate summaries is a major concern that limits its wide application. In this paper we present an approach to address factual consistency in summarization. We first propose an efficient automatic evaluation metric to measure factual consistency; next, we propose a novel learning algorithm that maximizes the proposed metric during model training. Through extensive experiments, we confirm that our method is effective in improving factual consistency and even overall quality of the summaries, as judged by both automatic metrics and human evaluation.

* ACL-IJCNLP 2021 
Viaarxiv icon

Supporting Clustering with Contrastive Learning

Mar 24, 2021
Dejiao Zhang, Feng Nan, Xiaokai Wei, Shangwen Li, Henghui Zhu, Kathleen McKeown, Ramesh Nallapati, Andrew Arnold, Bing Xiang

Figure 1 for Supporting Clustering with Contrastive Learning
Figure 2 for Supporting Clustering with Contrastive Learning
Figure 3 for Supporting Clustering with Contrastive Learning
Figure 4 for Supporting Clustering with Contrastive Learning

Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space. However, different categories often overlap with each other in the representation space at the beginning of the learning process, which poses a significant challenge for distance-based clustering in achieving good separation between different categories. To this end, we propose Supporting Clustering with Contrastive Learning (SCCL) -- a novel framework to leverage contrastive learning to promote better separation. We assess the performance of SCCL on short text clustering and show that SCCL significantly advances the state-of-the-art results on most benchmark datasets with 3%-11% improvement on Accuracy and 4%-15% improvement on Normalized Mutual Information. Furthermore, our quantitative analysis demonstrates the effectiveness of SCCL in leveraging the strengths of both bottom-up instance discrimination and top-down clustering to achieve better intra-cluster and inter-cluster distances when evaluated with the ground truth cluster labels

* NAACL 2021 
Viaarxiv icon