Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ming Gao

Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding

Oct 19, 2023

Jianing Wang, Qiushi Sun, Nuo Chen, Chengyu Wang, Jun Huang, Ming Gao, Xiang Li

Abstract:The recent success of large pre-trained language models (PLMs) heavily hinges on massive labeled data, which typically produces inferior performance in low-resource scenarios. To remedy this dilemma, we study self-training as one of the predominant semi-supervised learning (SSL) approaches, which utilizes large-scale unlabeled data to generate synthetic examples. However, too many noisy labels will hurt the model performance, and the self-training procedure requires multiple training iterations making it more expensive if all the model parameters of the PLM are updated. This paper presents UPET, a novel Uncertainty-aware Parameter-Efficient self-Training framework to effectively and efficiently address the labeled data scarcity issue. Specifically, we incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation for the teacher model and then judiciously select reliable pseudo-labeled examples based on confidence and certainty. During the student training, we introduce multiple parameter-efficient learning (PEL) paradigms that allow the optimization of only a small percentage of parameters. We also propose a novel Easy-Hard Contrastive Tuning to enhance the robustness and generalization. Extensive experiments over multiple downstream tasks demonstrate that UPET achieves a substantial improvement in terms of performance and efficiency. Our codes and data are released at https: //github.com/wjn1996/UPET.

* Accepted by Findings of EMNLP 2023

Via

Access Paper or Ask Questions

Boosting In-Context Learning with Factual Knowledge

Sep 26, 2023

Jianing Wang, Chengyu Wang, Chuanqi Tan, Jun Huang, Ming Gao

Figure 1 for Boosting In-Context Learning with Factual Knowledge

Figure 2 for Boosting In-Context Learning with Factual Knowledge

Figure 3 for Boosting In-Context Learning with Factual Knowledge

Figure 4 for Boosting In-Context Learning with Factual Knowledge

Abstract:In-Context Learning (ICL) over Large language models (LLMs) aims at solving previously unseen tasks by conditioning on a few training examples, eliminating the need for parameter updates and achieving competitive performance. In this paper, we demonstrate that factual knowledge is imperative for the performance of ICL in three core facets, i.e., the inherent knowledge learned in LLMs, the factual knowledge derived from the selected in-context examples, and the knowledge biases in LLMs for output generation. To unleash the power of LLMs in few-shot learning scenarios, we introduce a novel Knowledgeable In-Context Tuning (KICT) framework to further improve the performance of ICL: 1) injecting factual knowledge to LLMs during continual self-supervised pre-training, 2) judiciously selecting the examples with high knowledge relevance, and 3) calibrating the prediction results based on prior knowledge. We evaluate the proposed approaches on auto-regressive LLMs (e.g., GPT-style models) over multiple text classification and question answering tasks. Experimental results demonstrate that KICT substantially outperforms strong baselines, and improves by more than 13% and 7% of accuracy on text classification and question answering tasks, respectively.

Via

Access Paper or Ask Questions

DFRD: Data-Free Robustness Distillation for Heterogeneous Federated Learning

Sep 24, 2023

Kangyang Luo, Shuai Wang, Yexuan Fu, Xiang Li, Yunshi Lan, Ming Gao

Abstract:Federated Learning (FL) is a privacy-constrained decentralized machine learning paradigm in which clients enable collaborative training without compromising private data. However, how to learn a robust global model in the data-heterogeneous and model-heterogeneous FL scenarios is challenging. To address it, we resort to data-free knowledge distillation to propose a new FL method (namely DFRD). DFRD equips a conditional generator on the server to approximate the training space of the local models uploaded by clients, and systematically investigates its training in terms of fidelity, transferability} and diversity. To overcome the catastrophic forgetting of the global model caused by the distribution shifts of the generator across communication rounds, we maintain an exponential moving average copy of the generator on the server. Additionally, we propose dynamic weighting and label sampling to accurately extract knowledge from local models. Finally, our extensive experiments on various image classification tasks illustrate that DFRD achieves significant performance gains compared to SOTA baselines.

* Published as a conference paper at NeurIPS 2023

Via

Access Paper or Ask Questions

Exchanging-based Multimodal Fusion with Transformer

Sep 05, 2023

Renyu Zhu, Chengcheng Han, Yong Qian, Qiushi Sun, Xiang Li, Ming Gao, Xuezhi Cao, Yunsen Xian

Figure 1 for Exchanging-based Multimodal Fusion with Transformer

Figure 2 for Exchanging-based Multimodal Fusion with Transformer

Figure 3 for Exchanging-based Multimodal Fusion with Transformer

Figure 4 for Exchanging-based Multimodal Fusion with Transformer

Abstract:We study the problem of multimodal fusion in this paper. Recent exchanging-based methods have been proposed for vision-vision fusion, which aim to exchange embeddings learned from one modality to the other. However, most of them project inputs of multimodalities into different low-dimensional spaces and cannot be applied to the sequential input data. To solve these issues, in this paper, we propose a novel exchanging-based multimodal fusion model MuSE for text-vision fusion based on Transformer. We first use two encoders to separately map multimodal inputs into different low-dimensional spaces. Then we employ two decoders to regularize the embeddings and pull them into the same space. The two decoders capture the correlations between texts and images with the image captioning task and the text-to-image generation task, respectively. Further, based on the regularized embeddings, we present CrossTransformer, which uses two Transformer encoders with shared parameters as the backbone model to exchange knowledge between multimodalities. Specifically, CrossTransformer first learns the global contextual information of the inputs in the shallow layers. After that, it performs inter-modal exchange by selecting a proportion of tokens in one modality and replacing their embeddings with the average of embeddings in the other modality. We conduct extensive experiments to evaluate the performance of MuSE on the Multimodal Named Entity Recognition task and the Multimodal Sentiment Analysis task. Our results show the superiority of MuSE against other competitors. Our code and data are provided at https://github.com/RecklessRonan/MuSE.

Via

Access Paper or Ask Questions

Graph Self-Contrast Representation Learning

Sep 05, 2023

Minjie Chen, Yao Cheng, Ye Wang, Xiang Li, Ming Gao

Figure 1 for Graph Self-Contrast Representation Learning

Figure 2 for Graph Self-Contrast Representation Learning

Figure 3 for Graph Self-Contrast Representation Learning

Figure 4 for Graph Self-Contrast Representation Learning

Abstract:Graph contrastive learning (GCL) has recently emerged as a promising approach for graph representation learning. Some existing methods adopt the 1-vs-K scheme to construct one positive and K negative samples for each graph, but it is difficult to set K. For those methods that do not use negative samples, it is often necessary to add additional strategies to avoid model collapse, which could only alleviate the problem to some extent. All these drawbacks will undoubtedly have an adverse impact on the generalizability and efficiency of the model. In this paper, to address these issues, we propose a novel graph self-contrast framework GraphSC, which only uses one positive and one negative sample, and chooses triplet loss as the objective. Specifically, self-contrast has two implications. First, GraphSC generates both positive and negative views of a graph sample from the graph itself via graph augmentation functions of various intensities, and use them for self-contrast. Second, GraphSC uses Hilbert-Schmidt Independence Criterion (HSIC) to factorize the representations into multiple factors and proposes a masked self-contrast mechanism to better separate positive and negative samples. Further, Since the triplet loss only optimizes the relative distance between the anchor and its positive/negative samples, it is difficult to ensure the absolute distance between the anchor and positive sample. Therefore, we explicitly reduced the absolute distance between the anchor and positive sample to accelerate convergence. Finally, we conduct extensive experiments to evaluate the performance of GraphSC against 19 other state-of-the-art methods in both unsupervised and transfer learning settings.

* ICDM 2023(Regular)

Via

Access Paper or Ask Questions

TransPrompt v2: A Transferable Prompting Framework for Cross-task Text Classification

Aug 29, 2023

Jianing Wang, Chengyu Wang, Cen Chen, Ming Gao, Jun Huang, Aoying Zhou

Abstract:Text classification is one of the most imperative tasks in natural language processing (NLP). Recent advances with pre-trained language models (PLMs) have shown remarkable success on this task. However, the satisfying results obtained by PLMs heavily depend on the large amounts of task-specific labeled data, which may not be feasible in many application scenarios due to data access and privacy constraints. The recently-proposed prompt-based fine-tuning paradigm improves the performance of PLMs for few-shot text classification with task-specific templates. Yet, it is unclear how the prompting knowledge can be transferred across tasks, for the purpose of mutual reinforcement. We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks. For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner that captures the cross-task transferable knowledge. For learning across distant tasks, we further inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings among multiple distant tasks. Additionally, two de-biasing techniques are further designed to make the trained meta-learner more task-agnostic and unbiased towards any tasks. After that, the meta-learner can be adapted to each specific task with better parameters initialization. Extensive experiments show that TransPrompt v2 outperforms single-task and cross-task strong baselines over multiple NLP tasks and datasets. We further show that the meta-learner can effectively improve the performance of PLMs on previously unseen tasks. In addition, TransPrompt v2 also outperforms strong fine-tuning baselines when learning with full training sets.

Via

Access Paper or Ask Questions

UPFL: Unsupervised Personalized Federated Learning towards New Clients

Jul 29, 2023

Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao

Abstract:Personalized federated learning has gained significant attention as a promising approach to address the challenge of data heterogeneity. In this paper, we address a relatively unexplored problem in federated learning. When a federated model has been trained and deployed, and an unlabeled new client joins, providing a personalized model for the new client becomes a highly challenging task. To address this challenge, we extend the adaptive risk minimization technique into the unsupervised personalized federated learning setting and propose our method, FedTTA. We further improve FedTTA with two simple yet effective optimization strategies: enhancing the training of the adaptation model with proxy regularization and early-stopping the adaptation through entropy. Moreover, we propose a knowledge distillation loss specifically designed for FedTTA to address the device heterogeneity. Extensive experiments on five datasets against eleven baselines demonstrate the effectiveness of our proposed FedTTA and its variants. The code is available at: https://github.com/anonymous-federated-learning/code.

Via

Access Paper or Ask Questions

You Can Backdoor Personalized Federated Learning

Jul 29, 2023

Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao

Abstract:Backdoor attacks pose a significant threat to the security of federated learning systems. However, existing research primarily focuses on backdoor attacks and defenses within the generic FL scenario, where all clients collaborate to train a single global model. \citet{qin2023revisiting} conduct the first study of backdoor attacks in the personalized federated learning (pFL) scenario, where each client constructs a personalized model based on its local data. Notably, the study demonstrates that pFL methods with partial model-sharing can significantly boost robustness against backdoor attacks. In this paper, we whistleblow that pFL methods with partial model-sharing are still vulnerable to backdoor attacks in the absence of any defense. We propose three backdoor attack methods: BapFL, BapFL+, and Gen-BapFL, and we empirically demonstrate that they can effectively attack the pFL methods. Specifically, the key principle of BapFL lies in maintaining clean local parameters while implanting the backdoor into the global parameters. BapFL+ generalizes the attack success to benign clients by introducing Gaussian noise to the local parameters. Furthermore, we assume the collaboration of malicious clients and propose Gen-BapFL, which leverages meta-learning techniques to further enhances attack generalization. We evaluate our proposed attack methods against two classic pFL methods with partial model-sharing, FedPer and LG-FedAvg. Extensive experiments on four FL benchmark datasets demonstrate the effectiveness of our proposed attack methods. Additionally, we assess the defense efficacy of various defense strategies against our proposed attacks and find that Gradient Norm-Clipping is particularly effective. It is crucial to note that pFL method is not always secure in the presence of backdoor attacks, and we hope to inspire further research on attack and defense in pFL scenarios.

Via

Access Paper or Ask Questions

Boosting Language Models Reasoning with Chain-of-Knowledge Prompting

Jun 10, 2023

Jianing Wang, Qiushi Sun, Nuo Chen, Xiang Li, Ming Gao

Abstract:Recently, Chain-of-Thought (CoT) prompting has delivered success on complex reasoning tasks, which aims at designing a simple prompt like ``Let's think step by step'' or multiple in-context exemplars with well-designed rationales to elicit Large Language Models (LLMs) to generate intermediate reasoning steps. However, the generated rationales often come with mistakes, making unfactual and unfaithful reasoning chains. To mitigate this brittleness, we propose a novel Chain-of-Knowledge (CoK) prompting, where we aim at eliciting LLMs to generate explicit pieces of knowledge evidence in the form of structure triple. This is inspired by our human behaviors, i.e., we can draw a mind map or knowledge map as the reasoning evidence in the brain before answering a complex question. Benefiting from CoK, we additionally introduce a F^2-Verification method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness. For the unreliable response, the wrong evidence can be indicated to prompt the LLM to rethink. Extensive experiments demonstrate that our method can further improve the performance of commonsense, factual, symbolic, and arithmetic reasoning tasks.

Via

Access Paper or Ask Questions

CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer

May 26, 2023

Ming Gao, YanWu Xu, Yang Zhao, Tingbo Hou, Chenkai Zhao, Mingming Gong

Figure 1 for CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer

Figure 2 for CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer

Figure 3 for CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer

Figure 4 for CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer

Abstract:In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to novel scenes without re-train our model. A straightforward solution is to combine previous image-conditioned 3D style transfer and text-conditioned 2D style transfer \bigskip methods. However, such a solution cannot achieve our goal due to two main challenges. First, there is no multi-modal model matching point clouds and language at different feature scales (low-level, high-level). Second, we observe a style mixing issue when we stylize the content with different style conditions from text prompts. To address the first issue, we propose a 3D stylization framework to match the point cloud features with text features in local and global views. For the second issue, we propose an improved directional divergence loss to make arbitrary text styles more distinguishable as a complement to our framework. We conduct extensive experiments to show the effectiveness of our model on text-guided 3D scene style transfer.

* 17 pages, 14 figures

Via

Access Paper or Ask Questions