Alert button
Picture for Liang Ding

Liang Ding

Alert button

MerA: Merging Pretrained Adapters For Few-Shot Learning

Aug 30, 2023
Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao

Figure 1 for MerA: Merging Pretrained Adapters For Few-Shot Learning
Figure 2 for MerA: Merging Pretrained Adapters For Few-Shot Learning
Figure 3 for MerA: Merging Pretrained Adapters For Few-Shot Learning
Figure 4 for MerA: Merging Pretrained Adapters For Few-Shot Learning

Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks. However, it often yields subpar results in few-shot learning. AdapterFusion, which assembles pretrained adapters using composition layers tailored to specific tasks, is a possible solution but significantly increases trainable parameters and deployment costs. Despite this, our preliminary study reveals that even single adapters can outperform Adapterfusion in few-shot learning, urging us to propose \textbf{\texttt{Merging Pretrained Adapters}} (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Extensive experiments on two PLMs demonstrate that MerA achieves substantial improvements compared to both single adapters and AdapterFusion. To further enhance the capacity of MerA, we also introduce a simple yet effective technique, referred to as the "\textit{same-track}" setting, that merges adapters from the same track of pretraining tasks. With the implementation of the "\textit{same-track}" setting, we observe even more impressive gains, surpassing the performance of both full fine-tuning and adapter tuning by a substantial margin, e.g., 3.5\% in MRPC and 5.0\% in MNLI.

Viaarxiv icon

Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

Aug 29, 2023
Qingyue Wang, Liang Ding, Yanan Cao, Zhiliang Tian, Shi Wang, Dacheng Tao, Li Guo

Figure 1 for Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Figure 2 for Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Figure 3 for Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Figure 4 for Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

Most open-domain dialogue systems suffer from forgetting important information, especially in a long-term conversation. Existing works usually train the specific retriever or summarizer to obtain key information from the past, which is time-consuming and highly depends on the quality of labeled data. To alleviate this problem, we propose to recursively generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability. Specifically, our method first stimulates LLMs to memorize small dialogue contexts and then recursively produce new memory using previous memory and following contexts. Finally, the LLM can easily generate a highly consistent response with the help of the latest memory. We evaluate our method using ChatGPT and text-davinci-003, and the experiments on the widely-used public dataset show that our method can generate more consistent responses in a long-context conversation. Notably, our method is a potential solution to enable the LLM to model the extremely long context. Code and scripts will be released later.

Viaarxiv icon

Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?

Aug 25, 2023
Fei Wang, Liang Ding, Jun Rao, Ye Liu, Li Shen, Changxing Ding

Figure 1 for Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
Figure 2 for Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
Figure 3 for Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
Figure 4 for Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?

The multimedia community has shown a significant interest in perceiving and representing the physical world with multimodal pretrained neural network models, and among them, the visual-language pertaining (VLP) is, currently, the most captivating topic. However, there have been few endeavors dedicated to the exploration of 1) whether essential linguistic knowledge (e.g., semantics and syntax) can be extracted during VLP, and 2) how such linguistic knowledge impact or enhance the multimodal alignment. In response, here we aim to elucidate the impact of comprehensive linguistic knowledge, including semantic expression and syntactic structure, on multimodal alignment. Specifically, we design and release the SNARE, the first large-scale multimodal alignment probing benchmark, to detect the vital linguistic components, e.g., lexical, semantic, and syntax knowledge, containing four tasks: Semantic structure, Negation logic, Attribute ownership, and Relationship composition. Based on our proposed probing benchmarks, our holistic analyses of five advanced VLP models illustrate that the VLP model: i) shows insensitivity towards complex syntax structures and relies on content words for sentence comprehension; ii) demonstrates limited comprehension of combinations between sentences and negations; iii) faces challenges in determining the presence of actions or spatial relationships within visual information and struggles with verifying the correctness of triple combinations. We make our benchmark and code available at \url{https://github.com/WangFei-2019/SNARE/}.

* [TL;DR] we design and release the SNARE, the first large-scale multimodal alignment probing benchmark for current vision-language pretrained models 
Viaarxiv icon

Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup

Jul 30, 2023
Yan Sun, Li Shen, Hao Sun, Liang Ding, Dacheng Tao

Figure 1 for Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup
Figure 2 for Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup
Figure 3 for Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup
Figure 4 for Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup

Adaptive optimization has achieved notable success for distributed learning while extending adaptive optimizer to federated Learning (FL) suffers from severe inefficiency, including (i) rugged convergence due to inaccurate gradient estimation in global adaptive optimizer; (ii) client drifts exacerbated by local over-fitting with the local adaptive optimizer. In this work, we propose a novel momentum-based algorithm via utilizing the global gradient descent and locally adaptive amended optimizer to tackle these difficulties. Specifically, we incorporate a locally amended technique to the adaptive optimizer, named Federated Local ADaptive Amended optimizer (\textit{FedLADA}), which estimates the global average offset in the previous communication round and corrects the local offset through a momentum-like term to further improve the empirical training speed and mitigate the heterogeneous over-fitting. Theoretically, we establish the convergence rate of \textit{FedLADA} with a linear speedup property on the non-convex case under the partial participation settings. Moreover, we conduct extensive experiments on the real-world dataset to demonstrate the efficacy of our proposed \textit{FedLADA}, which could greatly reduce the communication rounds and achieves higher accuracy than several baselines.

* IEEE Transactions on Pattern Analysis and Machine Intelligence 
Viaarxiv icon

Free-Form Composition Networks for Egocentric Action Recognition

Jul 13, 2023
Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling

Figure 1 for Free-Form Composition Networks for Egocentric Action Recognition
Figure 2 for Free-Form Composition Networks for Egocentric Action Recognition
Figure 3 for Free-Form Composition Networks for Egocentric Action Recognition
Figure 4 for Free-Form Composition Networks for Egocentric Action Recognition

Egocentric action recognition is gaining significant attention in the field of human action recognition. In this paper, we address data scarcity issue in egocentric action recognition from a compositional generalization perspective. To tackle this problem, we propose a free-form composition network (FFCN) that can simultaneously learn disentangled verb, preposition, and noun representations, and then use them to compose new samples in the feature space for rare classes of action videos. First, we use a graph to capture the spatial-temporal relations among different hand/object instances in each action video. We thus decompose each action into a set of verb and preposition spatial-temporal representations using the edge features in the graph. The temporal decomposition extracts verb and preposition representations from different video frames, while the spatial decomposition adaptively learns verb and preposition representations from action-related instances in each frame. With these spatial-temporal representations of verbs and prepositions, we can compose new samples for those rare classes in a free-form manner, which is not restricted to a rigid form of a verb and a noun. The proposed FFCN can directly generate new training data samples for rare classes, hence significantly improve action recognition performance. We evaluated our method on three popular egocentric action recognition datasets, Something-Something V2, H2O, and EPIC-KITCHENS-100, and the experimental results demonstrate the effectiveness of the proposed method for handling data scarcity problems, including long-tailed and few-shot egocentric action recognition.

Viaarxiv icon

Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training

Jun 05, 2023
Yibin Lei, Liang Ding, Yu Cao, Changtong Zan, Andrew Yates, Dacheng Tao

Figure 1 for Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training
Figure 2 for Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training
Figure 3 for Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training
Figure 4 for Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training

Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios. Contrastive pre-training, which constructs pseudo-positive examples from unlabeled data, has shown great potential to solve this problem. However, the pseudo-positive examples crafted by data augmentations can be irrelevant. To this end, we propose relevance-aware contrastive learning. It takes the intermediate-trained model itself as an imperfect oracle to estimate the relevance of positive pairs and adaptively weighs the contrastive loss of different pairs according to the estimated relevance. Our method consistently improves the SOTA unsupervised Contriever model on the BEIR and open-domain QA retrieval benchmarks. Further exploration shows that our method can not only beat BM25 after further pre-training on the target corpus but also serves as a good few-shot learner. Our code is publicly available at https://github.com/Yibin-Lei/ReContriever.

* ACL 2023 Findings (Short), 5 pages main + 1 page references + 1 page appendix 
Viaarxiv icon

Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking

Jun 01, 2023
Qingyue Wang, Liang Ding, Yanan Cao, Yibing Zhan, Zheng Lin, Shi Wang, Dacheng Tao, Li Guo

Figure 1 for Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking
Figure 2 for Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking
Figure 3 for Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking
Figure 4 for Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking

Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data. Existing works mainly study common data- or model-level augmentation methods to enhance the generalization but fail to effectively decouple the semantics of samples, limiting the zero-shot performance of DST. In this paper, we present a simple and effective "divide, conquer and combine" solution, which explicitly disentangles the semantics of seen data, and leverages the performance and robustness with the mixture-of-experts mechanism. Specifically, we divide the seen data into semantically independent subsets and train corresponding experts, the newly unseen samples are mapped and inferred with mixture-of-experts with our designed ensemble inference. Extensive experiments on MultiWOZ2.1 upon the T5-Adapter show our schema significantly and consistently improves the zero-shot performance, achieving the SOTA on settings without external knowledge, with only 10M trainable parameters1.

* Accepted to ACL 2023 
Viaarxiv icon

Self-Evolution Learning for Discriminative Language Model Pretraining

May 24, 2023
Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

Figure 1 for Self-Evolution Learning for Discriminative Language Model Pretraining
Figure 2 for Self-Evolution Learning for Discriminative Language Model Pretraining
Figure 3 for Self-Evolution Learning for Discriminative Language Model Pretraining
Figure 4 for Self-Evolution Learning for Discriminative Language Model Pretraining

Masked language modeling, widely used in discriminative language model (e.g., BERT) pretraining, commonly adopts a random masking strategy. However, random masking does not consider the importance of the different words in the sentence meaning, where some of them are more worthy to be predicted. Therefore, various masking strategies (e.g., entity-level masking) are proposed, but most of them require expensive prior knowledge and generally train from scratch without reusing existing model weights. In this paper, we present Self-Evolution learning (SE), a simple and effective token masking and learning method to fully and wisely exploit the knowledge from data. SE focuses on learning the informative yet under-explored tokens and adaptively regularizes the training by introducing a novel Token-specific Label Smoothing approach. Experiments on 10 tasks show that our SE brings consistent and significant improvements (+1.43~2.12 average scores) upon different PLMs. In-depth analyses demonstrate that SE improves linguistic knowledge learning and generalization.

* Accepted to Findings of ACL2023 
Viaarxiv icon

Revisiting Token Dropping Strategy in Efficient BERT Pretraining

May 24, 2023
Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, Dacheng Tao

Figure 1 for Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Figure 2 for Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Figure 3 for Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Figure 4 for Revisiting Token Dropping Strategy in Efficient BERT Pretraining

Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers. It can effectively reduce the training time without degrading much performance on downstream tasks. However, we empirically find that token dropping is prone to a semantic loss problem and falls short in handling semantic-intense tasks. Motivated by this, we propose a simple yet effective semantic-consistent learning method (ScTD) to improve the token dropping. ScTD aims to encourage the model to learn how to preserve the semantic information in the representation space. Extensive experiments on 12 tasks show that, with the help of our ScTD, token dropping can achieve consistent and significant performance gains across all task types and model sizes. More encouragingly, ScTD saves up to 57% of pretraining time and brings up to +1.56% average improvement over the vanilla token dropping.

* Accepted to ACL2023 Main Conference 
Viaarxiv icon