Alert button
Picture for Sujian Li

Sujian Li

Alert button

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Sep 19, 2023
Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li

In this paper, we introduce Positional Skip-wisE (PoSE) training for efficient adaptation of large language models~(LLMs) to extremely long context windows. PoSE decouples train length from target context window size by simulating long inputs using a fixed context window with manipulated position indices during training. Concretely, we select several short chunks from a long input sequence, and introduce distinct skipping bias terms to modify the position indices of each chunk. These bias terms, along with the length of each chunk, are altered for each training example, allowing the model to adapt to all positions within the target context window without training on full length inputs. Experiments show that, compared with fine-tuning on the full length, PoSE greatly reduces memory and time overhead with minimal impact on performance. Leveraging this advantage, we have successfully extended the LLaMA model to 128k tokens. Furthermore, we empirically confirm that PoSE is compatible with all RoPE-based LLMs and various position interpolation strategies. Notably, by decoupling fine-tuning length from target context window, PoSE can theoretically extend the context window infinitely, constrained only by memory usage for inference. With ongoing advancements for efficient inference, we believe PoSE holds great promise for scaling the context window even further.

Viaarxiv icon

RestGPT: Connecting Large Language Models with Real-World Applications via RESTful APIs

Jun 11, 2023
Yifan Song, Weimin Xiong, Dawei Zhu, Cheng Li, Ke Wang, Ye Tian, Sujian Li

Figure 1 for RestGPT: Connecting Large Language Models with Real-World Applications via RESTful APIs
Figure 2 for RestGPT: Connecting Large Language Models with Real-World Applications via RESTful APIs
Figure 3 for RestGPT: Connecting Large Language Models with Real-World Applications via RESTful APIs
Figure 4 for RestGPT: Connecting Large Language Models with Real-World Applications via RESTful APIs

Tool-augmented large language models (LLMs) have achieved remarkable progress in tackling a broad range of queries. However, existing work are still in the experimental stage and has limitations in extensibility and robustness, especially facing the real-world applications. In this paper, we consider a more realistic scenario, connecting LLMs with RESTful APIs, which use the commonly adopted REST software architectural style for web service development. To address the practical challenges of planning and API usage, we introduce RestGPT, which leverages LLMs to solve user requests by connecting with RESTful APIs. Specifically, we propose a coarse-to-fine online planning mechanism to enhance the ability of planning and API selection. For the complex scenario of calling RESTful APIs, we also specially designed an API executor to formulate parameters and parse API responses. Experiments show that RestGPT is able to achieve impressive results in complex tasks and has strong robustness, which paves a new way towards AGI.

* Work in progress 
Viaarxiv icon

Contrastive Bootstrapping for Label Refinement

Jun 07, 2023
Shudi Hou, Yu Xia, Muhao Chen, Sujian Li

Figure 1 for Contrastive Bootstrapping for Label Refinement
Figure 2 for Contrastive Bootstrapping for Label Refinement
Figure 3 for Contrastive Bootstrapping for Label Refinement
Figure 4 for Contrastive Bootstrapping for Label Refinement

Traditional text classification typically categorizes texts into pre-defined coarse-grained classes, from which the produced models cannot handle the real-world scenario where finer categories emerge periodically for accurate services. In this work, we investigate the setting where fine-grained classification is done only using the annotation of coarse-grained categories and the coarse-to-fine mapping. We propose a lightweight contrastive clustering-based bootstrapping method to iteratively refine the labels of passages. During clustering, it pulls away negative passage-prototype pairs under the guidance of the mapping from both global and local perspectives. Experiments on NYT and 20News show that our method outperforms the state-of-the-art methods by a large margin.

* ACL 2023 
Viaarxiv icon

RepCL: Exploring Effective Representation for Continual Text Classification

May 12, 2023
Yifan Song, Peiyi Wang, Dawei Zhu, Tianyu Liu, Zhifang Sui, Sujian Li

Figure 1 for RepCL: Exploring Effective Representation for Continual Text Classification
Figure 2 for RepCL: Exploring Effective Representation for Continual Text Classification
Figure 3 for RepCL: Exploring Effective Representation for Continual Text Classification
Figure 4 for RepCL: Exploring Effective Representation for Continual Text Classification

Continual learning (CL) aims to constantly learn new knowledge over time while avoiding catastrophic forgetting on old tasks. In this work, we focus on continual text classification under the class-incremental setting. Recent CL studies find that the representations learned in one task may not be effective for other tasks, namely representation bias problem. For the first time we formally analyze representation bias from an information bottleneck perspective and suggest that exploiting representations with more class-relevant information could alleviate the bias. To this end, we propose a novel replay-based continual text classification method, RepCL. Our approach utilizes contrastive and generative representation learning objectives to capture more class-relevant features. In addition, RepCL introduces an adversarial replay strategy to alleviate the overfitting problem of replay. Experiments demonstrate that RepCL effectively alleviates forgetting and achieves state-of-the-art performance on three text classification tasks.

Viaarxiv icon

DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset

Mar 21, 2023
Hongbo Wang, Weimin Xiong, Yifan Song, Dawei Zhu, Yu Xia, Sujian Li

Figure 1 for DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset
Figure 2 for DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset
Figure 3 for DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset
Figure 4 for DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset

Joint entity and relation extraction (JERE) is one of the most important tasks in information extraction. However, most existing works focus on sentence-level coarse-grained JERE, which have limitations in real-world scenarios. In this paper, we construct a large-scale document-level fine-grained JERE dataset DocRED-FE, which improves DocRED with Fine-Grained Entity Type. Specifically, we redesign a hierarchical entity type schema including 11 coarse-grained types and 119 fine-grained types, and then re-annotate DocRED manually according to this schema. Through comprehensive experiments we find that: (1) DocRED-FE is challenging to existing JERE models; (2) Our fine-grained entity types promote relation classification. We make DocRED-FE with instruction and the code for our baselines publicly available at https://github.com/PKU-TANGENT/DOCRED-FE.

* Accepted by IEEE ICASSP 2023. The first two authors contribute equally 
Viaarxiv icon

Improving Sentence Similarity Estimation for Unsupervised Extractive Summarization

Feb 24, 2023
Shichao Sun, Ruifeng Yuan, Wenjie Li, Sujian Li

Figure 1 for Improving Sentence Similarity Estimation for Unsupervised Extractive Summarization
Figure 2 for Improving Sentence Similarity Estimation for Unsupervised Extractive Summarization
Figure 3 for Improving Sentence Similarity Estimation for Unsupervised Extractive Summarization

Unsupervised extractive summarization aims to extract salient sentences from a document as the summary without labeled data. Recent literatures mostly research how to leverage sentence similarity to rank sentences in the order of salience. However, sentence similarity estimation using pre-trained language models mostly takes little account of document-level information and has a weak correlation with sentence salience ranking. In this paper, we proposed two novel strategies to improve sentence similarity estimation for unsupervised extractive summarization. We use contrastive learning to optimize a document-level objective that sentences from the same document are more similar than those from different documents. Moreover, we use mutual learning to enhance the relationship between sentence similarity estimation and sentence salience ranking, where an extra signal amplifier is used to refine the pivotal information. Experimental results demonstrate the effectiveness of our strategies.

* Accepted by ICASSP 2023 
Viaarxiv icon

WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning

Dec 20, 2022
Wenhao Wu, Wei Li, Xinyan Xiao, Jiachen Liu, Sujian Li, Yajuan Lv

Figure 1 for WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning
Figure 2 for WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning
Figure 3 for WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning
Figure 4 for WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning

A crucial issue of current text generation models is that they often uncontrollably generate factually inconsistent text with respective of their inputs. Limited by the lack of annotated data, existing works in evaluating factual consistency directly transfer the reasoning ability of models trained on other data-rich upstream tasks like question answering (QA) and natural language inference (NLI) without any further adaptation. As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks. To alleviate this problem, we propose a weakly supervised framework that aggregates multiple resources to train a precise and efficient factual metric, namely WeCheck. WeCheck first utilizes a generative model to accurately label a real generated sample by aggregating its weak labels, which are inferred from multiple resources. Then, we train the target metric model with the weak supervision while taking noises into consideration. Comprehensive experiments on a variety of tasks demonstrate the strong performance of WeCheck, which achieves a 3.4\% absolute improvement over previous state-of-the-art methods on TRUE benchmark on average.

Viaarxiv icon

Consecutive Question Generation via Dynamic Multitask Learning

Nov 16, 2022
Yunji Li, Sujian Li, Xing Shi

Figure 1 for Consecutive Question Generation via Dynamic Multitask Learning
Figure 2 for Consecutive Question Generation via Dynamic Multitask Learning
Figure 3 for Consecutive Question Generation via Dynamic Multitask Learning
Figure 4 for Consecutive Question Generation via Dynamic Multitask Learning

In this paper, we propose the task of consecutive question generation (CQG), which generates a set of logically related question-answer pairs to understand a whole passage, with a comprehensive consideration of the aspects including accuracy, coverage, and informativeness. To achieve this, we first examine the four key elements of CQG, i.e., question, answer, rationale, and context history, and propose a novel dynamic multitask framework with one main task generating a question-answer pair, and four auxiliary tasks generating other elements. It directly helps the model generate good questions through both joint training and self-reranking. At the same time, to fully explore the worth-asking information in a given passage, we make use of the reranking losses to sample the rationales and search for the best question series globally. Finally, we measure our strategy by QA data augmentation and manual evaluation, as well as a novel application of generated question-answer pairs on DocNLI. We prove that our strategy can improve question generation significantly and benefit multiple related NLP tasks.

* Findings of EMNLP 2022 
Viaarxiv icon

FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness

Nov 01, 2022
Wenhao Wu, Wei Li, Jiachen Liu, Xinyan Xiao, Ziqiang Cao, Sujian Li, Hua Wu

Figure 1 for FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness
Figure 2 for FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness
Figure 3 for FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness
Figure 4 for FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness

Despite being able to generate fluent and grammatical text, current Seq2Seq summarization models still suffering from the unfaithful generation problem. In this paper, we study the faithfulness of existing systems from a new perspective of factual robustness which is the ability to correctly generate factual information over adversarial unfaithful information. We first measure a model's factual robustness by its success rate to defend against adversarial attacks when generating factual information. The factual robustness analysis on a wide range of current systems shows its good consistency with human judgments on faithfulness. Inspired by these findings, we propose to improve the faithfulness of a model by enhancing its factual robustness. Specifically, we propose a novel training strategy, namely FRSUM, which teaches the model to defend against both explicit adversarial samples and implicit factual adversarial perturbations. Extensive automatic and human evaluation results show that FRSUM consistently improves the faithfulness of various Seq2Seq models, such as T5, BART.

* Findings of EMNLP 2022 
Viaarxiv icon

IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System

Oct 18, 2022
Xiangyang Li, Bo Chen, HuiFeng Guo, Jingjie Li, Chenxu Zhu, Xiang Long, Sujian Li, Yichao Wang, Wei Guo, Longxia Mao, Jinxing Liu, Zhenhua Dong, Ruiming Tang

Figure 1 for IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System
Figure 2 for IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System
Figure 3 for IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System
Figure 4 for IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System

Scoring a large number of candidates precisely in several milliseconds is vital for industrial pre-ranking systems. Existing pre-ranking systems primarily adopt the \textbf{two-tower} model since the ``user-item decoupling architecture'' paradigm is able to balance the \textit{efficiency} and \textit{effectiveness}. However, the cost of high efficiency is the neglect of the potential information interaction between user and item towers, hindering the prediction accuracy critically. In this paper, we show it is possible to design a two-tower model that emphasizes both information interactions and inference efficiency. The proposed model, IntTower (short for \textit{Interaction enhanced Two-Tower}), consists of Light-SE, FE-Block and CIR modules. Specifically, lightweight Light-SE module is used to identify the importance of different features and obtain refined feature representations in each tower. FE-Block module performs fine-grained and early feature interactions to capture the interactive signals between user and item towers explicitly and CIR module leverages a contrastive interaction regularization to further enhance the interactions implicitly. Experimental results on three public datasets show that IntTower outperforms the SOTA pre-ranking models significantly and even achieves comparable performance in comparison with the ranking models. Moreover, we further verify the effectiveness of IntTower on a large-scale advertisement pre-ranking system. The code of IntTower is publicly available\footnote{https://github.com/archersama/IntTower}

* Accept by CIKM 2022 & DLP-KDD best paper 
Viaarxiv icon