Alert button
Picture for Tianyi Zhou

Tianyi Zhou

Alert button

It Takes One to Tango but More Make Trouble? In-Context Training with Different Number of Demonstrations

Mar 14, 2023
Jiuhai Chen, LiChang Chen, Tianyi Zhou

Figure 1 for It Takes One to Tango but More Make Trouble? In-Context Training with Different Number of Demonstrations
Figure 2 for It Takes One to Tango but More Make Trouble? In-Context Training with Different Number of Demonstrations
Figure 3 for It Takes One to Tango but More Make Trouble? In-Context Training with Different Number of Demonstrations
Figure 4 for It Takes One to Tango but More Make Trouble? In-Context Training with Different Number of Demonstrations

Large language models (LLMs) are capable to perform complex reasoning by in-context learning (ICL) when provided with a few input-output demonstrations (demos) and more powerful when intermediate reasoning steps ("chain of thoughts (CoT)") of the demos are given. Is it necessary to use multi-demo in ICL? In this paper, we study ICL using fewer demos for each test query on the tasks in~\cite{wei2022chain}. Surprisingly, we do not observe significant degradation when using only one randomly chosen demo. To study this phenomenon, for each test query, we categorize demos into "correct demos" leading to the correct answer, and "wrong demos" resulting in wrong answers. Our analysis reveals an inherent bias in those widely studied datasets: most demos are correct for a majority of test queries, which explains the good performance of using one random demo. Moreover, ICL (with and w/o CoT) using only one correct demo significantly outperforms all-demo ICL adopted by most previous works, indicating the weakness of LLMs in finding correct demo(s) for input queries, which is difficult to evaluate on the biased datasets. Furthermore, we observe a counterintuitive behavior of ICL using multi-demo, i.e., its accuracy degrades(improves) when given more correct(wrong) demos. This implies that ICL can be easily misguided by interference among demos and their spurious correlations. Our analyses highlight several fundamental challenges that need to be addressed in LLMs training, ICL, and benchmark design.

Viaarxiv icon

Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks

Jan 27, 2023
Haiyan Zhao, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang

Figure 1 for Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks
Figure 2 for Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks
Figure 3 for Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks
Figure 4 for Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks

As a few large-scale pre-trained models become the major choices of various applications, new challenges arise for model pruning, e.g., can we avoid pruning the same model from scratch for every downstream task? How to reuse the pruning results of previous tasks to accelerate the pruning for a new task? To address these challenges, we create a small model for a new task from the pruned models of similar tasks. We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task. We study this ''meta-pruning'' from nearest tasks on two major classes of pre-trained models, convolutional neural network (CNN) and vision transformer (ViT), under a limited budget of pruning iterations. Our study begins by investigating the overlap of pruned models for similar tasks and how the overlap changes over different layers and blocks. Inspired by these discoveries, we develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task by initializing a sub-network from the pruned models of its nearest tasks. In experiments, we demonstrate MVP's advantages in accuracy, efficiency, and generalization through extensive empirical studies and comparisons with popular pruning methods over several datasets.

Viaarxiv icon

Federated Recommendation with Additive Personalization

Jan 24, 2023
Zhiwei Li, Guodong Long, Tianyi Zhou

Figure 1 for Federated Recommendation with Additive Personalization
Figure 2 for Federated Recommendation with Additive Personalization
Figure 3 for Federated Recommendation with Additive Personalization
Figure 4 for Federated Recommendation with Additive Personalization

With rising concerns about privacy, developing recommendation systems in a federated setting become a new paradigm to develop next-generation Internet service architecture. However, existing approaches are usually derived from a distributed recommendation framework with an additional mechanism for privacy protection, thus most of them fail to fully exploit personalization in the new context of federated recommendation settings. In this paper, we propose a novel approach called Federated Recommendation with Additive Personalization (FedRAP) to enhance recommendation by learning user embedding and the user's personal view of item embeddings. Specifically, the proposed additive personalization is to add a personalized item embedding to a sparse global item embedding aggregated from all users. Moreover, a curriculum learning mechanism has been applied for additive personalization on item embeddings by gradually increasing regularization weights to mitigate the performance degradation caused by large variances among client-specific item embeddings. A unified formulation has been proposed with a sparse regularization of global item embeddings for reducing communication overhead. Experimental results on four real-world recommendation datasets demonstrate the effectiveness of FedRAP.

* 9 pages, conference 
Viaarxiv icon

Dual Personalization on Federated Recommendation

Jan 16, 2023
Chunxu Zhang, Guodong Long, Tianyi Zhou, Peng Yan, Zijian Zhang, Chengqi Zhang, Bo Yang

Figure 1 for Dual Personalization on Federated Recommendation
Figure 2 for Dual Personalization on Federated Recommendation
Figure 3 for Dual Personalization on Federated Recommendation
Figure 4 for Dual Personalization on Federated Recommendation

Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of RecSys in federated settings.

* Under Review 
Viaarxiv icon

Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach

Nov 02, 2022
Kaiwen Yang, Yanchao Sun, Jiahao Su, Fengxiang He, Xinmei Tian, Furong Huang, Tianyi Zhou, Dacheng Tao

Figure 1 for Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach
Figure 2 for Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach
Figure 3 for Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach
Figure 4 for Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach

Data augmentation is a critical contributing factor to the success of deep learning but heavily relies on prior domain knowledge which is not always available. Recent works on automatic data augmentation learn a policy to form a sequence of augmentation operations, which are still pre-defined and restricted to limited options. In this paper, we show that a prior-free autonomous data augmentation's objective can be derived from a representation learning principle that aims to preserve the minimum sufficient information of the labels. Given an example, the objective aims at creating a distant "hard positive example" as the augmentation, while still preserving the original label. We then propose a practical surrogate to the objective that can be optimized efficiently and integrated seamlessly into existing methods for a broad class of machine learning tasks, e.g., supervised, semi-supervised, and noisy-label learning. Unlike previous works, our method does not require training an extra generative model but instead leverages the intermediate layer representations of the end-task model for generating data augmentations. In experiments, we show that our method consistently brings non-trivial improvements to the three aforementioned learning tasks from both efficiency and final performance, either or not combined with strong pre-defined augmentations, e.g., on medical images when domain knowledge is unavailable and the existing augmentation techniques perform poorly. Code is available at: https://github.com/kai-wen-yang/LPA3}{https://github.com/kai-wen-yang/LPA3.

* 36th Conference on Neural Information Processing Systems (NeurIPS 2022) 
Viaarxiv icon

TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack

Oct 27, 2022
Yu Cao, Dianqi Li, Meng Fang, Tianyi Zhou, Jun Gao, Yibing Zhan, Dacheng Tao

Figure 1 for TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack
Figure 2 for TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack
Figure 3 for TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack
Figure 4 for TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack

We present Twin Answer Sentences Attack (TASA), an adversarial attack method for question answering (QA) models that produces fluent and grammatical adversarial contexts while maintaining gold answers. Despite phenomenal progress on general adversarial attacks, few works have investigated the vulnerability and attack specifically for QA models. In this work, we first explore the biases in the existing models and discover that they mainly rely on keyword matching between the question and context, and ignore the relevant contextual relations for answer prediction. Based on two biases above, TASA attacks the target model in two folds: (1) lowering the model's confidence on the gold answer with a perturbed answer sentence; (2) misguiding the model towards a wrong answer with a distracting answer sentence. Equipped with designed beam search and filtering methods, TASA can generate more effective attacks than existing textual attack methods while sustaining the quality of contexts, in extensive experiments on five QA datasets and human evaluations.

* Accepted by EMNLP 2022 (long), 9 pages main + 2 pages references + 7 pages appendix 
Viaarxiv icon

Federated Learning from Pre-Trained Models: A Contrastive Learning Approach

Sep 21, 2022
Yue Tan, Guodong Long, Jie Ma, Lu Liu, Tianyi Zhou, Jing Jiang

Figure 1 for Federated Learning from Pre-Trained Models: A Contrastive Learning Approach
Figure 2 for Federated Learning from Pre-Trained Models: A Contrastive Learning Approach
Figure 3 for Federated Learning from Pre-Trained Models: A Contrastive Learning Approach
Figure 4 for Federated Learning from Pre-Trained Models: A Contrastive Learning Approach

Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. In this work, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner. Sharing prototypes rather than learnable model parameters allows each client to fuse the representations in a personalized way while keeping the shared knowledge in a compact form for efficient communication. We perform a thorough evaluation of the proposed FedPCL in the lightweight framework, measuring and visualizing its ability to fuse various pre-trained models on popular FL datasets.

Viaarxiv icon

Phrase-level Textual Adversarial Attack with Label Preservation

May 24, 2022
Yibin Lei, Yu Cao, Dianqi Li, Tianyi Zhou, Meng Fang, Mykola Pechenizkiy

Figure 1 for Phrase-level Textual Adversarial Attack with Label Preservation
Figure 2 for Phrase-level Textual Adversarial Attack with Label Preservation
Figure 3 for Phrase-level Textual Adversarial Attack with Label Preservation
Figure 4 for Phrase-level Textual Adversarial Attack with Label Preservation

Generating high-quality textual adversarial examples is critical for investigating the pitfalls of natural language processing (NLP) models and further promoting their robustness. Existing attacks are usually realized through word-level or sentence-level perturbations, which either limit the perturbation space or sacrifice fluency and textual quality, both affecting the attack effectiveness. In this paper, we propose Phrase-Level Textual Adversarial aTtack (PLAT) that generates adversarial samples through phrase-level perturbations. PLAT first extracts the vulnerable phrases as attack targets by a syntactic parser, and then perturbs them by a pre-trained blank-infilling model. Such flexible perturbation design substantially expands the search space for more effective attacks without introducing too many modifications, and meanwhile maintaining the textual fluency and grammaticality via contextualized generation using surrounding texts. Moreover, we develop a label-preservation filter leveraging the likelihoods of language models fine-tuned on each class, rather than textual similarity, to rule out those perturbations that potentially alter the original class label for humans. Extensive experiments and human evaluation demonstrate that PLAT has a superior attack effectiveness as well as a better label consistency than strong baselines.

* NAACL-HLT 2022 Findings (Long), 9 pages + 2 pages references + 8 pages appendix 
Viaarxiv icon

FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with Noisy Labels

May 20, 2022
Zhuowei Wang, Tianyi Zhou, Guodong Long, Bo Han, Jing Jiang

Figure 1 for FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with Noisy Labels
Figure 2 for FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with Noisy Labels
Figure 3 for FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with Noisy Labels
Figure 4 for FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with Noisy Labels

Federated learning (FL) aims at training a global model on the server side while the training data are collected and located at the local devices. Hence, the labels in practice are usually annotated by clients of varying expertise or criteria and thus contain different amounts of noises. Local training on noisy labels can easily result in overfitting to noisy labels, which is devastating to the global model through aggregation. Although recent robust FL methods take malicious clients into account, they have not addressed local noisy labels on each device and the impact to the global model. In this paper, we develop a simple two-level sampling method "FedNoiL" that (1) selects clients for more robust global aggregation on the server; and (2) selects clean labels and correct pseudo-labels at the client end for more robust local training. The sampling probabilities are built upon clean label detection by the global model. Moreover, we investigate different schedules changing the local epochs between aggregations over the course of FL, which notably improves the communication and computation efficiency in noisy label setting. In experiments with homogeneous/heterogeneous data distributions and noise ratios, we observed that direct combinations of SOTA FL methods with SOTA noisy-label learning methods can easily fail but our method consistently achieves better and robust performance.

* 12 pages 
Viaarxiv icon

Token Dropping for Efficient BERT Pretraining

Mar 24, 2022
Le Hou, Richard Yuanzhe Pang, Tianyi Zhou, Yuexin Wu, Xinying Song, Xiaodan Song, Denny Zhou

Figure 1 for Token Dropping for Efficient BERT Pretraining
Figure 2 for Token Dropping for Efficient BERT Pretraining
Figure 3 for Token Dropping for Efficient BERT Pretraining
Figure 4 for Token Dropping for Efficient BERT Pretraining

Transformer-based models generally allocate the same amount of computation for each token in a given sequence. We develop a simple but effective "token dropping" method to accelerate the pretraining of transformer models, such as BERT, without degrading its performance on downstream tasks. In short, we drop unimportant tokens starting from an intermediate layer in the model to make the model focus on important tokens; the dropped tokens are later picked up by the last layer of the model so that the model still produces full-length sequences. We leverage the already built-in masked language modeling (MLM) loss to identify unimportant tokens with practically no computational overhead. In our experiments, this simple approach reduces the pretraining cost of BERT by 25% while achieving similar overall fine-tuning performance on standard downstream tasks.

* ACL 2022 
Viaarxiv icon