Alert button
Picture for Wei Lu

Wei Lu

Alert button

Create and Find Flatness: Building Flat Training Spaces in Advance for Continual Learning

Sep 20, 2023
Wenhang Shi, Yiren Chen, Zhe Zhao, Wei Lu, Kimmo Yan, Xiaoyong Du

Catastrophic forgetting remains a critical challenge in the field of continual learning, where neural networks struggle to retain prior knowledge while assimilating new information. Most existing studies emphasize mitigating this issue only when encountering new tasks, overlooking the significance of the pre-task phase. Therefore, we shift the attention to the current task learning stage, presenting a novel framework, C&F (Create and Find Flatness), which builds a flat training space for each task in advance. Specifically, during the learning of the current task, our framework adaptively creates a flat region around the minimum in the loss landscape. Subsequently, it finds the parameters' importance to the current task based on their flatness degrees. When adapting the model to a new task, constraints are applied according to the flatness and a flat space is simultaneously prepared for the impending task. We theoretically demonstrate the consistency between the created and found flatness. In this manner, our framework not only accommodates ample parameter space for learning new tasks but also preserves the preceding knowledge of earlier tasks. Experimental results exhibit C&F's state-of-the-art performance as a standalone continual learning approach and its efficacy as a framework incorporating other methods. Our work is available at https://github.com/Eric8932/Create-and-Find-Flatness.

* 10pages, ECAI2023 conference 
Viaarxiv icon

One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning

Jun 12, 2023
Guangtao Zeng, Peiyuan Zhang, Wei Lu

Figure 1 for One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning
Figure 2 for One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning
Figure 3 for One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning
Figure 4 for One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning

Fine-tuning pre-trained language models for multiple tasks tends to be expensive in terms of storage. To mitigate this, parameter-efficient transfer learning (PETL) methods have been proposed to address this issue, but they still require a significant number of parameters and storage when being applied to broader ranges of tasks. To achieve even greater storage reduction, we propose PROPETL, a novel method that enables efficient sharing of a single PETL module which we call prototype network (e.g., adapter, LoRA, and prefix-tuning) across layers and tasks. We then learn binary masks to select different sub-networks from the shared prototype network and apply them as PETL modules into different layers. We find that the binary masks can determine crucial information from the network, which is often ignored in previous studies. Our work can also be seen as a type of pruning method, where we find that overparameterization also exists in the seemingly small PETL modules. We evaluate PROPETL on various downstream tasks and show that it can outperform other PETL methods with approximately 10% of the parameter storage required by the latter.

* Accepted by ACL 2023 
Viaarxiv icon

Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning

Jun 09, 2023
Zhanming Jie, Wei Lu

Figure 1 for Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning
Figure 2 for Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning
Figure 3 for Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning
Figure 4 for Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning

Chain-of-thought (CoT) prompting with large language models has proven effective in numerous natural language processing tasks, but designing prompts that generalize well to diverse problem types can be challenging, especially in the context of math word problem (MWP) solving. Additionally, it is common to have a large amount of training data that have a better diversity coverage but CoT annotations are not available, which limits the use of supervised learning techniques. To address these issues, we investigate two approaches to leverage the training data in a few-shot prompting scenario: dynamic program prompting and program distillation. Our approach is largely inspired by Gao et al., (2022), where they proposed to replace the CoT with the programs as the intermediate reasoning step. Such a prompting strategy allows us to accurately verify the answer correctness through program execution in MWP solving. Our dynamic program prompting involves annotating the training data by sampling correct programs from a large language model, while program distillation involves adapting a smaller model to the program-annotated training data. Our experiments on three standard MWP datasets demonstrate the effectiveness of these approaches, yielding significant improvements over previous baselines for prompting and fine-tuning. Our results suggest that leveraging a large amount of training data can improve the generalization ability of prompts and boost the performance of fine-tuned small models in MWP solving.

* ACL 2023 Findings 
Viaarxiv icon

Learning Multi-Step Reasoning by Solving Arithmetic Tasks

Jun 07, 2023
Tianduo Wang, Wei Lu

Figure 1 for Learning Multi-Step Reasoning by Solving Arithmetic Tasks
Figure 2 for Learning Multi-Step Reasoning by Solving Arithmetic Tasks
Figure 3 for Learning Multi-Step Reasoning by Solving Arithmetic Tasks
Figure 4 for Learning Multi-Step Reasoning by Solving Arithmetic Tasks

Mathematical reasoning is regarded as a necessary ability for Language Models (LMs). Recent works demonstrate large LMs' impressive performance in solving math problems. The success is attributed to their Chain-of-Thought (CoT) reasoning abilities, i.e., the ability to decompose complex questions into step-by-step reasoning chains, but such ability seems only to emerge from models with abundant parameters. This work investigates how to incorporate relatively small LMs with the capabilities of multi-step reasoning. We propose to inject such abilities by continually pre-training LMs on a synthetic dataset MsAT which is composed of Multi-step Arithmetic Tasks. Our experiments on four math word problem datasets show the effectiveness of the proposed method in enhancing LMs' math reasoning abilities.

* ACL 2023. Code and data are available at https://github.com/TianduoWang/MsAT 
Viaarxiv icon

Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers

Jun 01, 2023
Jiaxi Li, Wei Lu

Figure 1 for Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers
Figure 2 for Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers
Figure 3 for Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers
Figure 4 for Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers

Recent advancements in pre-trained language models (PLMs) have demonstrated that these models possess some degree of syntactic awareness. To leverage this knowledge, we propose a novel chart-based method for extracting parse trees from masked language models (LMs) without the need to train separate parsers. Our method computes a score for each span based on the distortion of contextual representations resulting from linguistic perturbations. We design a set of perturbations motivated by the linguistic concept of constituency tests, and use these to score each span by aggregating the distortion scores. To produce a parse tree, we use chart parsing to find the tree with the minimum score. Our method consistently outperforms previous state-of-the-art methods on English with masked LMs, and also demonstrates superior performance in a multilingual setting, outperforming the state of the art in 6 out of 8 languages. Notably, although our method does not involve parameter updates or extensive hyperparameter search, its performance can even surpass some unsupervised parsing methods that require fine-tuning. Our analysis highlights that the distortion of contextual representation resulting from syntactic perturbation can serve as an effective indicator of constituency across languages.

* Accepted by ACL 2023 
Viaarxiv icon

Tab-CoT: Zero-shot Tabular Chain of Thought

May 28, 2023
Ziqi Jin, Wei Lu

Figure 1 for Tab-CoT: Zero-shot Tabular Chain of Thought
Figure 2 for Tab-CoT: Zero-shot Tabular Chain of Thought
Figure 3 for Tab-CoT: Zero-shot Tabular Chain of Thought
Figure 4 for Tab-CoT: Zero-shot Tabular Chain of Thought

The chain-of-though (CoT) prompting methods were successful in various natural language processing (NLP) tasks thanks to their ability to unveil the underlying complex reasoning processes. Such reasoning processes typically exhibit implicitly structured steps. Recent efforts also started investigating methods to encourage more explicitly structured reasoning procedures to be captured. In this work, we propose Tab-CoT, a novel tabular-format CoT prompting method, which allows the complex reasoning process to be explicitly modelled in a highly structured manner. Despite its simplicity, we show that our approach is capable of performing reasoning across multiple dimensions (i.e., both rows and columns). We demonstrate our approach's strong zero-shot and few-shot capabilities through extensive experiments on a range of reasoning tasks.

* accepted by ACL 2023 Finding 
Viaarxiv icon

Better Sampling of Negatives for Distantly Supervised Named Entity Recognition

May 22, 2023
Lu Xu, Lidong Bing, Wei Lu

Figure 1 for Better Sampling of Negatives for Distantly Supervised Named Entity Recognition
Figure 2 for Better Sampling of Negatives for Distantly Supervised Named Entity Recognition
Figure 3 for Better Sampling of Negatives for Distantly Supervised Named Entity Recognition
Figure 4 for Better Sampling of Negatives for Distantly Supervised Named Entity Recognition

Distantly supervised named entity recognition (DS-NER) has been proposed to exploit the automatically labeled training data instead of human annotations. The distantly annotated datasets are often noisy and contain a considerable number of false negatives. The recent approach uses a weighted sampling approach to select a subset of negative samples for training. However, it requires a good classifier to assign weights to the negative samples. In this paper, we propose a simple and straightforward approach for selecting the top negative samples that have high similarities with all the positive samples for training. Our method achieves consistent performance improvements on four distantly supervised NER datasets. Our analysis also shows that it is critical to differentiate the true negatives from the false negatives.

* Accepted by ACL Findings 2023 
Viaarxiv icon

CIT-EmotionNet: CNN Interactive Transformer Network for EEG Emotion Recognition

May 07, 2023
Wei Lu, Hua Ma, Tien-Ping Tan

Figure 1 for CIT-EmotionNet: CNN Interactive Transformer Network for EEG Emotion Recognition
Figure 2 for CIT-EmotionNet: CNN Interactive Transformer Network for EEG Emotion Recognition
Figure 3 for CIT-EmotionNet: CNN Interactive Transformer Network for EEG Emotion Recognition

Emotion recognition using Electroencephalogram (EEG) signals has emerged as a significant research challenge in affective computing and intelligent interaction. However, effectively combining global and local features of EEG signals to improve performance in emotion recognition is still a difficult task. In this study, we propose a novel CNN Interactive Transformer Network for EEG Emotion Recognition, known as CIT-EmotionNet, which efficiently integrates global and local features of EEG signals. Initially, we convert raw EEG signals into spatial-frequency representations, which serve as inputs. Then, we integrate Convolutional Neural Network (CNN) and Transformer within a single framework in a parallel manner. Finally, we design a CNN interactive Transformer module, which facilitates the interaction and fusion of local and global features, thereby enhancing the model's ability to extract both types of features from EEG spatial-frequency representations. The proposed CIT-EmotionNet outperforms state-of-the-art methods, achieving an average recognition accuracy of 98.57\% and 92.09\% on two publicly available datasets, SEED and SEED-IV, respectively.

* 10 pages,3 tables 
Viaarxiv icon

Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge

May 05, 2023
Jiawei Liu, Zi Xiong, Yi Jiang, Yongqiang Ma, Wei Lu, Yong Huang, Qikai Cheng

Figure 1 for Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge
Figure 2 for Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge
Figure 3 for Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge
Figure 4 for Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge

Fine-tuning pre-trained language models (PLMs), e.g., SciBERT, generally requires large numbers of annotated data to achieve state-of-the-art performance on a range of NLP tasks in the scientific domain. However, obtaining the fine-tune data for scientific NLP task is still challenging and expensive. Inspired by recent advancement in prompt learning, in this paper, we propose the Mix Prompt Tuning (MPT), which is a semi-supervised method to alleviate the dependence on annotated data and improve the performance of multi-granularity academic function recognition tasks with a small number of labeled examples. Specifically, the proposed method provides multi-perspective representations by combining manual prompt templates with automatically learned continuous prompt templates to help the given academic function recognition task take full advantage of knowledge in PLMs. Based on these prompt templates and the fine-tuned PLM, a large number of pseudo labels are assigned to the unlabeled examples. Finally, we fine-tune the PLM using the pseudo training set. We evaluate our method on three academic function recognition tasks of different granularity including the citation function, the abstract sentence function, and the keyword function, with datasets from computer science domain and biomedical domain. Extensive experiments demonstrate the effectiveness of our method and statistically significant improvements against strong baselines. In particular, it achieves an average increase of 5% in Macro-F1 score compared with fine-tuning, and 6% in Macro-F1 score compared with other semi-supervised method under low-resource settings. In addition, MPT is a general method that can be easily applied to other low-resource scientific classification tasks.

* 22 pages, 5 figures 
Viaarxiv icon

MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Apr 19, 2023
Zicheng Zhang, Wei Wu, Wei Sun, Dangyang Tu, Wei Lu, Xiongkuo Min, Ying Chen, Guangtao Zhai

Figure 1 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
Figure 2 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
Figure 3 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
Figure 4 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

User-generated content (UGC) live videos are often bothered by various distortions during capture procedures and thus exhibit diverse visual qualities. Such source videos are further compressed and transcoded by media server providers before being distributed to end-users. Because of the flourishing of UGC live videos, effective video quality assessment (VQA) tools are needed to monitor and perceptually optimize live streaming videos in the distributing process. In this paper, we address \textbf{UGC Live VQA} problems by constructing a first-of-a-kind subjective UGC Live VQA database and developing an effective evaluation tool. Concretely, 418 source UGC videos are collected in real live streaming scenarios and 3,762 compressed ones at different bit rates are generated for the subsequent subjective VQA experiments. Based on the built database, we develop a \underline{M}ulti-\underline{D}imensional \underline{VQA} (\textbf{MD-VQA}) evaluator to measure the visual quality of UGC live videos from semantic, distortion, and motion aspects respectively. Extensive experimental results show that MD-VQA achieves state-of-the-art performance on both our UGC Live VQA database and existing compressed UGC VQA databases.

* Accepted to CVPR2023 
Viaarxiv icon