Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Shu

Nanjing Agricultural University

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

May 25, 2023

Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Canoee Liu, Simon Tong, Jindong Chen, Lei Meng

Figure 1 for RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Figure 2 for RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Figure 3 for RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Figure 4 for RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Abstract:Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities in long-form text generation tasks expressed through natural language instructions. However, user expectations for long-form text rewriting is high, and unintended rewrites (''hallucinations'') produced by the model can negatively impact its overall performance. Existing evaluation benchmarks primarily focus on limited rewriting styles and sentence-level rewriting rather than long-form open-ended rewriting.We introduce OpenRewriteEval, a novel benchmark that covers a wide variety of rewriting types expressed through natural language instructions. It is specifically designed to facilitate the evaluation of open-ended rewriting of long-form texts. In addition, we propose a strong baseline model, RewriteLM, an instruction-tuned large language model for long-form text rewriting. We develop new strategies that facilitate the generation of diverse instructions and preference data with minimal human intervention. We conduct empirical experiments and demonstrate that our model outperforms the current state-of-the-art LLMs in text rewriting. Specifically, it excels in preserving the essential content and meaning of the source text, minimizing the generation of ''hallucinated'' content, while showcasing the ability to generate rewrites with diverse wording and structures.

Via

Access Paper or Ask Questions

Adapting a Language Model While Preserving its General Knowledge

Jan 21, 2023

Zixuan Ke, Yijia Shao, Haowei Lin, Hu Xu, Lei Shu, Bing Liu

Abstract:Domain-adaptive pre-training (or DA-training for short), also known as post-training, aims to train a pre-trained general-purpose language model (LM) using an unlabeled corpus of a particular domain to adapt the LM so that end-tasks in the domain can give improved performances. However, existing DA-training methods are in some sense blind as they do not explicitly identify what knowledge in the LM should be preserved and what should be changed by the domain corpus. This paper shows that the existing methods are suboptimal and proposes a novel method to perform a more informed adaptation of the knowledge in the LM by (1) soft-masking the attention heads based on their importance to best preserve the general knowledge in the LM and (2) contrasting the representations of the general and the full (both general and domain knowledge) to learn an integrated representation with both general and domain-specific knowledge. Experimental results will demonstrate the effectiveness of the proposed approach.

* EMNLP 2022

Via

Access Paper or Ask Questions

Continual Training of Language Models for Few-Shot Learning

Oct 11, 2022

Zixuan Ke, Haowei Lin, Yijia Shao, Hu Xu, Lei Shu, Bing Liu

Figure 1 for Continual Training of Language Models for Few-Shot Learning

Figure 2 for Continual Training of Language Models for Few-Shot Learning

Figure 3 for Continual Training of Language Models for Few-Shot Learning

Figure 4 for Continual Training of Language Models for Few-Shot Learning

Abstract:Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowledge without forgetting its previous skills. The goal is to improve the few-shot end-task learning in these domains. The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system. Experimental results verify its effectiveness.

* EMNLP 2022

Via

Access Paper or Ask Questions

Zero-Shot Aspect-Based Sentiment Analysis

Feb 15, 2022

Lei Shu, Hu Xu, Bing Liu, Jiahua Chen

Figure 1 for Zero-Shot Aspect-Based Sentiment Analysis

Figure 2 for Zero-Shot Aspect-Based Sentiment Analysis

Figure 3 for Zero-Shot Aspect-Based Sentiment Analysis

Figure 4 for Zero-Shot Aspect-Based Sentiment Analysis

Abstract:Aspect-based sentiment analysis (ABSA) typically requires in-domain annotated data for supervised training/fine-tuning. It is a big challenge to scale ABSA to a large number of new domains. This paper aims to train a unified model that can perform zero-shot ABSA without using any annotated data for a new domain. We propose a method called contrastive post-training on review Natural Language Inference (CORN). Later ABSA tasks can be cast into NLI for zero-shot transfer. We evaluate CORN on ABSA tasks, ranging from aspect extraction (AE), aspect sentiment classification (ASC), to end-to-end aspect-based sentiment analysis (E2E ABSA), which show ABSA can be conducted without any human annotated ABSA data.

Via

Access Paper or Ask Questions

Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Feb 07, 2022

Deng Cai, Elman Mansimov, Yi-An Lai, Yixuan Su, Lei Shu, Yi Zhang

Figure 1 for Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Figure 2 for Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Figure 3 for Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Figure 4 for Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Abstract:Recent advance in deep learning has led to rapid adoption of machine learning based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor. This work studies model update regression in structured prediction tasks. We choose syntactic dependency parsing and conversational semantic parsing as representative examples of structured prediction tasks in NLP. First, we measure and analyze model update regression in different model update settings. Next, we explore and benchmark existing techniques for reducing model update regression including model ensemble and knowledge distillation. We further propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured output. Experiments show that BCR can better mitigate model update regression than model ensemble and knowledge distillation approaches.

Via

Access Paper or Ask Questions

Continual Learning with Knowledge Transfer for Sentiment Classification

Dec 18, 2021

Zixuan Ke, Bing Liu, Hao Wang, Lei Shu

Figure 1 for Continual Learning with Knowledge Transfer for Sentiment Classification

Figure 2 for Continual Learning with Knowledge Transfer for Sentiment Classification

Figure 3 for Continual Learning with Knowledge Transfer for Sentiment Classification

Figure 4 for Continual Learning with Knowledge Transfer for Sentiment Classification

Abstract:This paper studies continual learning (CL) for sentiment classification (SC). In this setting, the CL system learns a sequence of SC tasks incrementally in a neural network, where each task builds a classifier to classify the sentiment of reviews of a particular product category or domain. Two natural questions are: Can the system transfer the knowledge learned in the past from the previous tasks to the new task to help it learn a better model for the new task? And, can old models for previous tasks be improved in the process as well? This paper proposes a novel technique called KAN to achieve these objectives. KAN can markedly improve the SC accuracy of both the new task and the old tasks via forward and backward knowledge transfer. The effectiveness of KAN is demonstrated through extensive experiments.

* ECML-PKDD 2020

Via

Access Paper or Ask Questions

CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks

Dec 05, 2021

Zixuan Ke, Bing Liu, Hu Xu, Lei Shu

Figure 1 for CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks

Figure 2 for CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks

Figure 3 for CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks

Figure 4 for CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks

Abstract:This paper studies continual learning (CL) of a sequence of aspect sentiment classification(ASC) tasks in a particular CL setting called domain incremental learning (DIL). Each task is from a different domain or product. The DIL setting is particularly suited to ASC because in testing the system needs not know the task/domain to which the test data belongs. To our knowledge, this setting has not been studied before for ASC. This paper proposes a novel model called CLASSIC. The key novelty is a contrastive continual learning method that enables both knowledge transfer across tasks and knowledge distillation from old tasks to the new task, which eliminates the need for task ids in testing. Experimental results show the high effectiveness of CLASSIC.

* EMNLP 2021

Via

Access Paper or Ask Questions

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Dec 05, 2021

Zixuan Ke, Bing Liu, Nianzu Ma, Hu Xu, Lei Shu

Figure 1 for Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Figure 2 for Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Figure 3 for Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Figure 4 for Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Abstract:Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and KT, our experiments show that they suffer from serious CF when the tasks do not have much shared knowledge. Another observation is that most current CL methods do not use pre-trained models, but it has been shown that such models can significantly improve the end task performance. For example, in natural language processing, fine-tuning a BERT-like pre-trained language model is one of the most effective approaches. However, for CL, this approach suffers from serious CF. An interesting question is how to make the best use of pre-trained models for CL. This paper proposes a novel model called CTR to solve these problems. Our experimental results demonstrate the effectiveness of CTR

* NeurIPS 2021

Via

Access Paper or Ask Questions

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Nov 09, 2021

Yixuan Su, Fangyu Liu, Zaiqiao Meng, Lei Shu, Ehsan Shareghi, Nigel Collier

Figure 1 for TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Figure 2 for TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Figure 3 for TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Figure 4 for TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Abstract:Masked language models (MLMs) such as BERT and RoBERTa have revolutionized the field of Natural Language Understanding in the past few years. However, existing pre-trained MLMs often output an anisotropic distribution of token representations that occupies a narrow subset of the entire representation space. Such token representations are not ideal, especially for tasks that demand discriminative semantic meanings of distinct tokens. In this work, we propose TaCL (Token-aware Contrastive Learning), a novel continual pre-training approach that encourages BERT to learn an isotropic and discriminative distribution of token representations. TaCL is fully unsupervised and requires no additional data. We extensively test our approach on a wide range of English and Chinese benchmarks. The results show that TaCL brings consistent and notable improvements over the original BERT model. Furthermore, we conduct detailed analysis to reveal the merits and inner-workings of our approach.

* Work in progress

Via

Access Paper or Ask Questions

Tea Chrysanthemum Detection under Unstructured Environments Using the TC-YOLO Model

Nov 04, 2021

Chao Qi, Junfeng Gao, Simon Pearson, Helen Harman, Kunjie Chen, Lei Shu

Figure 1 for Tea Chrysanthemum Detection under Unstructured Environments Using the TC-YOLO Model

Figure 2 for Tea Chrysanthemum Detection under Unstructured Environments Using the TC-YOLO Model

Figure 3 for Tea Chrysanthemum Detection under Unstructured Environments Using the TC-YOLO Model

Figure 4 for Tea Chrysanthemum Detection under Unstructured Environments Using the TC-YOLO Model

Abstract:Tea chrysanthemum detection at its flowering stage is one of the key components for selective chrysanthemum harvesting robot development. However, it is a challenge to detect flowering chrysanthemums under unstructured field environments given the variations on illumination, occlusion and object scale. In this context, we propose a highly fused and lightweight deep learning architecture based on YOLO for tea chrysanthemum detection (TC-YOLO). First, in the backbone component and neck component, the method uses the Cross-Stage Partially Dense Network (CSPDenseNet) as the main network, and embeds custom feature fusion modules to guide the gradient flow. In the final head component, the method combines the recursive feature pyramid (RFP) multiscale fusion reflow structure and the Atrous Spatial Pyramid Pool (ASPP) module with cavity convolution to achieve the detection task. The resulting model was tested on 300 field images, showing that under the NVIDIA Tesla P100 GPU environment, if the inference speed is 47.23 FPS for each image (416 * 416), TC-YOLO can achieve the average precision (AP) of 92.49% on our own tea chrysanthemum dataset. In addition, this method (13.6M) can be deployed on a single mobile GPU, and it could be further developed as a perception system for a selective chrysanthemum harvesting robot in the future.

Via

Access Paper or Ask Questions