Alert button
Picture for Shuai Zhao

Shuai Zhao

Alert button

A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation

Aug 01, 2023
Minghao Chen, Zepeng Gao, Shuai Zhao, Qibo Qiu, Wenxiao Wang, Binbin Lin, Xiaofei He

Figure 1 for A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation
Figure 2 for A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation
Figure 3 for A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation
Figure 4 for A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation

Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels. However, these methods necessitate a labeled target validation set for hyper-parameter tuning and model selection. In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels. We begin with the metric based on mutual information of the model prediction. Through empirical analysis, we identify three prevalent issues with this metric: 1) It does not account for the source structure. 2) It can be easily attacked. 3) It fails to detect negative transfer caused by the over-alignment of source and target features. To address the first two issues, we incorporate source accuracy into the metric and employ a new MLP classifier that is held out during training, significantly improving the result. To tackle the final issue, we integrate this enhanced metric with data augmentation, resulting in a novel unsupervised UDA metric called the Augmentation Consistency Metric (ACM). Additionally, we empirically demonstrate the shortcomings of previous experiment settings and conduct large-scale experiments to validate the effectiveness of our proposed metric. Furthermore, we employ our metric to automatically search for the optimal hyper-parameter set, achieving superior performance compared to manually tuned sets across four common benchmarks. Codes will be available soon.

Viaarxiv icon

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

May 29, 2023
Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

Figure 1 for Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Figure 2 for Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Figure 3 for Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Figure 4 for Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Misalignment between the outputs of a vision-language (VL) model and task goal hinders its deployment. This issue can worsen when there are distribution shifts between the training and test data. To address this problem, prevailing fully test-time adaptation~(TTA) methods bootstrap themselves through entropy minimization. However, minimizing the entropy of the predictions makes the model overfit to incorrect output distributions of itself. In this work, we propose TTA with feedback to avoid such overfitting and align the model with task goals. Specifically, we adopt CLIP as reward model to provide feedback for VL models during test time in various tasks, including image classification, image-text retrieval, and image captioning. Given a single test sample, the model aims to maximize CLIP reward through reinforcement learning. We adopt a reward design with the average CLIP score of sampled candidates as the baseline. This design is simple and surprisingly effective when combined with various task-specific sampling strategies. The entire system is flexible, allowing the reward model to be extended with multiple CLIP models. Plus, a momentum buffer can be used to memorize and leverage the learned knowledge from multiple test samples. Extensive experiments demonstrate that our method significantly improves different VL models after TTA.

* preprint, work in progress; project URL https://github.com/mzhaoshuai/RLCF 
Viaarxiv icon

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

May 23, 2023
Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

Figure 1 for CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
Figure 2 for CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
Figure 3 for CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
Figure 4 for CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

Pre-trained vision-language models are the de-facto foundation models for various downstream tasks. However, this trend has not extended to the field of scene text recognition (STR), despite the potential of CLIP to serve as a powerful scene text reader. CLIP can robustly identify regular (horizontal) and irregular (rotated, curved, blurred, or occluded) text in natural images. With such merits, we introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP. It has two encoder-decoder branches: a visual branch and a cross-modal branch. The visual branch provides an initial prediction based on the visual feature, and the cross-modal branch refines this prediction by addressing the discrepancy between the visual feature and text semantics. To fully leverage the capabilities of both branches, we design a dual predict-and-refine decoding scheme for inference. CLIP4STR achieves new state-of-the-art performance on 11 STR benchmarks. Additionally, a comprehensive empirical study is provided to enhance the understanding of the adaptation of CLIP to STR. We believe our method establishes a simple but strong baseline for future STR research with VL models.

* Preprint, work in progress 
Viaarxiv icon

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

May 09, 2023
Shuai Zhao, Jinming Wen, Luu Anh Tuan, Junbo Zhao, Jie Fu

Figure 1 for Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models
Figure 2 for Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models
Figure 3 for Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models
Figure 4 for Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based learning is vulnerable to backdoor attacks. Textual backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection and label modification. However, they suffer from flaws such as abnormal natural language expressions resulting from the trigger and incorrect labeling of poisoned samples. In this study, we propose ProAttack, a novel and efficient method for performing clean-label backdoor attacks based on the prompt, which uses the prompt itself as a trigger. Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack. With extensive experiments on rich-resource and few-shot text classification tasks, we empirically validate ProAttack's competitive performance in textual backdoor attacks. Notably, in the rich-resource setting, ProAttack achieves state-of-the-art attack success rates in the clean-label backdoor attack benchmark without external triggers.

* Modify Figure 1 
Viaarxiv icon

Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding

Mar 02, 2023
Yingting Li, Ambuj Mehrish, Shuai Zhao, Rishabh Bhardwaj, Amir Zadeh, Navonil Majumder, Rada Mihalcea, Soujanya Poria

Figure 1 for Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding
Figure 2 for Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding
Figure 3 for Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding
Figure 4 for Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding

Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models. Parameter inefficiency can however arise when, during transfer learning, all the parameters of a large pre-trained model need to be updated for individual downstream tasks. As the number of parameters grows, fine-tuning is prone to overfitting and catastrophic forgetting. In addition, full fine-tuning can become prohibitively expensive when the model is used for many tasks. To mitigate this issue, parameter-efficient transfer learning algorithms, such as adapters and prefix tuning, have been proposed as a way to introduce a few trainable parameters that can be plugged into large pre-trained language models such as BERT, and HuBERT. In this paper, we introduce the Speech UndeRstanding Evaluation (SURE) benchmark for parameter-efficient learning for various speech-processing tasks. Additionally, we introduce a new adapter, ConvAdapter, based on 1D convolution. We show that ConvAdapter outperforms the standard adapters while showing comparable performance against prefix tuning and LoRA with only 0.94% of trainable parameters on some of the task in SURE. We further explore the effectiveness of parameter efficient transfer learning for speech synthesis task such as Text-to-Speech (TTS).

* ICASSP 2023 
Viaarxiv icon

Fusing Models for Prognostics and Health Management of Lithium-Ion Batteries Based on Physics-Informed Neural Networks

Jan 02, 2023
Pengfei Wen, Zhi-Sheng Ye, Yong Li, Shaowei Chen, Shuai Zhao

Figure 1 for Fusing Models for Prognostics and Health Management of Lithium-Ion Batteries Based on Physics-Informed Neural Networks
Figure 2 for Fusing Models for Prognostics and Health Management of Lithium-Ion Batteries Based on Physics-Informed Neural Networks
Figure 3 for Fusing Models for Prognostics and Health Management of Lithium-Ion Batteries Based on Physics-Informed Neural Networks
Figure 4 for Fusing Models for Prognostics and Health Management of Lithium-Ion Batteries Based on Physics-Informed Neural Networks

For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.

Viaarxiv icon

AutoPINN: When AutoML Meets Physics-Informed Neural Networks

Dec 08, 2022
Xinle Wu, Dalin Zhang, Miao Zhang, Chenjuan Guo, Shuai Zhao, Yi Zhang, Huai Wang, Bin Yang

Figure 1 for AutoPINN: When AutoML Meets Physics-Informed Neural Networks
Figure 2 for AutoPINN: When AutoML Meets Physics-Informed Neural Networks

Physics-Informed Neural Networks (PINNs) have recently been proposed to solve scientific and engineering problems, where physical laws are introduced into neural networks as prior knowledge. With the embedded physical laws, PINNs enable the estimation of critical parameters, which are unobservable via physical tools, through observable variables. For example, Power Electronic Converters (PECs) are essential building blocks for the green energy transition. PINNs have been applied to estimate the capacitance, which is unobservable during PEC operations, using current and voltage, which can be observed easily during operations. The estimated capacitance facilitates self-diagnostics of PECs. Existing PINNs are often manually designed, which is time-consuming and may lead to suboptimal performance due to a large number of design choices for neural network architectures and hyperparameters. In addition, PINNs are often deployed on different physical devices, e.g., PECs, with limited and varying resources. Therefore, it requires designing different PINN models under different resource constraints, making it an even more challenging task for manual design. To contend with the challenges, we propose Automated Physics-Informed Neural Networks (AutoPINN), a framework that enables the automated design of PINNs by combining AutoML and PINNs. Specifically, we first tailor a search space that allows finding high-accuracy PINNs for PEC internal parameter estimation. We then propose a resource-aware search strategy to explore the search space to find the best PINN model under different resource constraints. We experimentally demonstrate that AutoPINN is able to find more accurate PINN models than human-designed, state-of-the-art PINN models using fewer resources.

Viaarxiv icon

A Transformer-Based Substitute Recommendation Model Incorporating Weakly Supervised Customer Behavior Data

Nov 04, 2022
Wenting Ye, Hongfei Yang, Shuai Zhao, Haoyang Fang, Xingjian Shi, Naveen Neppalli

Figure 1 for A Transformer-Based Substitute Recommendation Model Incorporating Weakly Supervised Customer Behavior Data
Figure 2 for A Transformer-Based Substitute Recommendation Model Incorporating Weakly Supervised Customer Behavior Data
Figure 3 for A Transformer-Based Substitute Recommendation Model Incorporating Weakly Supervised Customer Behavior Data
Figure 4 for A Transformer-Based Substitute Recommendation Model Incorporating Weakly Supervised Customer Behavior Data

The substitute-based recommendation is widely used in E-commerce to provide better alternatives to customers. However, existing research typically uses the customer behavior signals like co-view and view-but-purchase-another to capture the substitute relationship. Despite its intuitive soundness, we find that such an approach might ignore the functionality and characteristics of products. In this paper, we adapt substitute recommendation into language matching problem by taking product title description as model input to consider product functionality. We design a new transformation method to de-noise the signals derived from production data. In addition, we consider multilingual support from the engineering point of view. Our proposed end-to-end transformer-based model achieves both successes from offline and online experiments. The proposed model has been deployed in a large-scale E-commerce website for 11 marketplaces in 6 languages. Our proposed model is demonstrated to increase revenue by 19% based on an online A/B experiment.

* 6 pages, 3 figures, 5 tables, accepted in 21st IEEE International Conference on Machine Learning and Applications 
Viaarxiv icon

Slimmable Networks for Contrastive Self-supervised Learning

Sep 30, 2022
Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

Figure 1 for Slimmable Networks for Contrastive Self-supervised Learning
Figure 2 for Slimmable Networks for Contrastive Self-supervised Learning
Figure 3 for Slimmable Networks for Contrastive Self-supervised Learning
Figure 4 for Slimmable Networks for Contrastive Self-supervised Learning

Self-supervised learning makes great progress in large model pre-training but suffers in training small models. Previous solutions to this problem mainly rely on knowledge distillation and indeed have a two-stage learning procedure: first train a large teacher model, then distill it to improve the generalization ability of small ones. In this work, we present a new one-stage solution to obtain pre-trained small models without extra teachers: slimmable networks for contrastive self-supervised learning (\emph{SlimCLR}). A slimmable network contains a full network and several weight-sharing sub-networks. We can pre-train for only one time and obtain various networks including small ones with low computation costs. However, in self-supervised cases, the interference between weight-sharing networks leads to severe performance degradation. One evidence of the interference is \emph{gradient imbalance}: a small proportion of parameters produces dominant gradients during backpropagation, and the main parameters may not be fully optimized. The divergence in gradient directions of various networks may also cause interference between networks. To overcome these problems, we make the main parameters produce dominant gradients and provide consistent guidance for sub-networks via three techniques: slow start training of sub-networks, online distillation, and loss re-weighting according to model sizes. Besides, a switchable linear probe layer is applied during linear evaluation to avoid the interference of weight-sharing linear layers. We instantiate SlimCLR with typical contrastive learning frameworks and achieve better performance than previous arts with fewer parameters and FLOPs.

* preprint,work in progress 
Viaarxiv icon