Alert button
Picture for Guangyu Sun

Guangyu Sun

Alert button

FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning

Aug 17, 2023
Guangyu Sun, Matias Mendieta, Jun Luo, Shandong Wu, Chen Chen

Figure 1 for FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning
Figure 2 for FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning
Figure 3 for FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning
Figure 4 for FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning

Personalized Federated Learning (PFL) represents a promising solution for decentralized learning in heterogeneous data environments. Partial model personalization has been proposed to improve the efficiency of PFL by selectively updating local model parameters instead of aggregating all of them. However, previous work on partial model personalization has mainly focused on Convolutional Neural Networks (CNNs), leaving a gap in understanding how it can be applied to other popular models such as Vision Transformers (ViTs). In this work, we investigate where and how to partially personalize a ViT model. Specifically, we empirically evaluate the sensitivity to data distribution of each type of layer. Based on the insights that the self-attention layer and the classification head are the most sensitive parts of a ViT, we propose a novel approach called FedPerfix, which leverages plugins to transfer information from the aggregated model to the local client as a personalization. Finally, we evaluate the proposed approach on CIFAR-100, OrganAMNIST, and Office-Home datasets and demonstrate its effectiveness in improving the model's performance compared to several advanced PFL methods.

* 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 
Viaarxiv icon

RPTQ: Reorder-based Post-training Quantization for Large Language Models

Apr 25, 2023
Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

Figure 1 for RPTQ: Reorder-based Post-training Quantization for Large Language Models
Figure 2 for RPTQ: Reorder-based Post-training Quantization for Large Language Models
Figure 3 for RPTQ: Reorder-based Post-training Quantization for Large Language Models
Figure 4 for RPTQ: Reorder-based Post-training Quantization for Large Language Models

Large-scale language models (LLMs) have demonstrated outstanding performance on various tasks, but their deployment poses challenges due to their enormous model size. In this paper, we identify that the main challenge in quantizing LLMs stems from the different activation ranges between the channels, rather than just the issue of outliers.We propose a novel reorder-based quantization approach, RPTQ, that addresses the issue of quantizing the activations of LLMs. RPTQ rearranges the channels in the activations and then quantizing them in clusters, thereby reducing the impact of range difference of channels. In addition, we reduce the storage and computation overhead by avoiding explicit reordering. By implementing this approach, we achieved a significant breakthrough by pushing LLM models to 3 bit activation for the first time.

* 17 pages 
Viaarxiv icon

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

Mar 23, 2023
Zhihang Yuan, Jiawei Liu, Jiaxiang Wu, Dawei Yang, Qiang Wu, Guangyu Sun, Wenyu Liu, Xinggang Wang, Bingzhe Wu

Figure 1 for Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance
Figure 2 for Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance
Figure 3 for Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance
Figure 4 for Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures. Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored. This paper first investigates this problem on various commonly-used PTQ methods. We aim to answer several research questions related to the influence of calibration set distribution variations, calibration paradigm selection, and data augmentation or sampling strategies on PTQ reliability. A systematic evaluation process is conducted across a wide range of tasks and commonly-used PTQ paradigms. The results show that most existing PTQ methods are not reliable enough in term of the worst-case group performance, highlighting the need for more robust methods. Our findings provide insights for developing PTQ methods that can effectively handle distribution shift scenarios and enable the deployment of quantized DNNs in real-world applications.

Viaarxiv icon

Latency-aware Spatial-wise Dynamic Networks

Oct 12, 2022
Yizeng Han, Zhihang Yuan, Yifan Pu, Chenhao Xue, Shiji Song, Guangyu Sun, Gao Huang

Figure 1 for Latency-aware Spatial-wise Dynamic Networks
Figure 2 for Latency-aware Spatial-wise Dynamic Networks
Figure 3 for Latency-aware Spatial-wise Dynamic Networks
Figure 4 for Latency-aware Spatial-wise Dynamic Networks

Spatial-wise dynamic convolution has become a promising approach to improving the inference efficiency of deep networks. By allocating more computation to the most informative pixels, such an adaptive inference paradigm reduces the spatial redundancy in image features and saves a considerable amount of unnecessary computation. However, the theoretical efficiency achieved by previous methods can hardly translate into a realistic speedup, especially on the multi-core processors (e.g. GPUs). The key challenge is that the existing literature has only focused on designing algorithms with minimal computation, ignoring the fact that the practical latency can also be influenced by scheduling strategies and hardware properties. To bridge the gap between theoretical computation and practical efficiency, we propose a latency-aware spatial-wise dynamic network (LASNet), which performs coarse-grained spatially adaptive inference under the guidance of a novel latency prediction model. The latency prediction model can efficiently estimate the inference latency of dynamic networks by simultaneously considering algorithms, scheduling strategies, and hardware properties. We use the latency predictor to guide both the algorithm design and the scheduling optimization on various hardware platforms. Experiments on image classification, object detection and instance segmentation demonstrate that the proposed framework significantly improves the practical inference efficiency of deep networks. For example, the average latency of a ResNet-101 on the ImageNet validation set could be reduced by 36% and 46% on a server GPU (Nvidia Tesla-V100) and an edge device (Nvidia Jetson TX2 GPU) respectively without sacrificing the accuracy. Code is available at https://github.com/LeapLabTHU/LASNet.

* NeurIPS 2022 
Viaarxiv icon

Exploring Parameter-Efficient Fine-tuning for Improving Communication Efficiency in Federated Learning

Oct 04, 2022
Guangyu Sun, Matias Mendieta, Taojiannan Yang, Chen Chen

Figure 1 for Exploring Parameter-Efficient Fine-tuning for Improving Communication Efficiency in Federated Learning
Figure 2 for Exploring Parameter-Efficient Fine-tuning for Improving Communication Efficiency in Federated Learning
Figure 3 for Exploring Parameter-Efficient Fine-tuning for Improving Communication Efficiency in Federated Learning
Figure 4 for Exploring Parameter-Efficient Fine-tuning for Improving Communication Efficiency in Federated Learning

Federated learning (FL) has emerged as a promising paradigm for enabling the collaborative training of models without centralized access to the raw data on local devices. In the typical FL paradigm (e.g., FedAvg), model weights are sent to and from the server each round to participating clients. However, this can quickly put a massive communication burden on the system, especially if more capable models beyond very small MLPs are employed. Recently, the use of pre-trained models has been shown effective in federated learning optimization and improving convergence. This opens the door for new research questions. Can we adjust the weight-sharing paradigm in federated learning, leveraging strong and readily-available pre-trained models, to significantly reduce the communication burden while simultaneously achieving excellent performance? To this end, we investigate the use of parameter-efficient fine-tuning in federated learning. Specifically, we systemically evaluate the performance of several parameter-efficient fine-tuning methods across a variety of client stability, data distribution, and differential privacy settings. By only locally tuning and globally sharing a small portion of the model weights, significant reductions in the total communication overhead can be achieved while maintaining competitive performance in a wide range of federated learning scenarios, providing insight into a new paradigm for practical and effective federated systems.

Viaarxiv icon

Anomaly Crossing: A New Method for Video Anomaly Detection as Cross-domain Few-shot Learning

Dec 14, 2021
Guangyu Sun, Zhang Liu, Lianggong Wen, Jing Shi, Chenliang Xu

Figure 1 for Anomaly Crossing: A New Method for Video Anomaly Detection as Cross-domain Few-shot Learning
Figure 2 for Anomaly Crossing: A New Method for Video Anomaly Detection as Cross-domain Few-shot Learning
Figure 3 for Anomaly Crossing: A New Method for Video Anomaly Detection as Cross-domain Few-shot Learning
Figure 4 for Anomaly Crossing: A New Method for Video Anomaly Detection as Cross-domain Few-shot Learning

Video anomaly detection aims to identify abnormal events that occurred in videos. Since anomalous events are relatively rare, it is not feasible to collect a balanced dataset and train a binary classifier to solve the task. Thus, most previous approaches learn only from normal videos using unsupervised or semi-supervised methods. Obviously, they are limited in capturing and utilizing discriminative abnormal characteristics, which leads to compromised anomaly detection performance. In this paper, to address this issue, we propose a new learning paradigm by making full use of both normal and abnormal videos for video anomaly detection. In particular, we formulate a new learning task: cross-domain few-shot anomaly detection, which can transfer knowledge learned from numerous videos in the source domain to help solve few-shot abnormality detection in the target domain. Concretely, we leverage self-supervised training on the target normal videos to reduce the domain gap and devise a meta context perception module to explore the video context of the event in the few-shot setting. Our experiments show that our method significantly outperforms baseline methods on DoTA and UCF-Crime datasets, and the new task contributes to a more practical training paradigm for anomaly detection.

Viaarxiv icon

PTQ4ViT: Post-Training Quantization Framework for Vision Transformers

Nov 24, 2021
Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, Guangyu Sun

Figure 1 for PTQ4ViT: Post-Training Quantization Framework for Vision Transformers
Figure 2 for PTQ4ViT: Post-Training Quantization Framework for Vision Transformers
Figure 3 for PTQ4ViT: Post-Training Quantization Framework for Vision Transformers
Figure 4 for PTQ4ViT: Post-Training Quantization Framework for Vision Transformers

Quantization is one of the most effective methods to compress neural networks, which has achieved great success on convolutional neural networks (CNNs). Recently, vision transformers have demonstrated great potential in computer vision. However, previous post-training quantization methods performed not well on vision transformer, resulting in more than 1% accuracy drop even in 8-bit quantization. Therefore, we analyze the problems of quantization on vision transformers. We observe the distributions of activation values after softmax and GELU functions are quite different from the Gaussian distribution. We also observe that common quantization metrics, such as MSE and cosine distance, are inaccurate to determine the optimal scaling factor. In this paper, we propose the twin uniform quantization method to reduce the quantization error on these activation values. And we propose to use a Hessian guided metric to evaluate different scaling factors, which improves the accuracy of calibration with a small cost. To enable the fast quantization of vision transformers, we develop an efficient framework, PTQ4ViT. Experiments show the quantized vision transformers achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task.

Viaarxiv icon

GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing

Nov 01, 2021
Zhe Zhou, Cong Li, Xuechao Wei, Guangyu Sun

Figure 1 for GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing
Figure 2 for GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing
Figure 3 for GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing
Figure 4 for GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing

Recently, Graph Convolutional Networks (GCNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data. However, it is challenging to realize efficient GCN training, especially on large graphs. The reasons are many-folded: 1) GCN training incurs a substantial memory footprint. Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory to buffer the intermediate data for back-propagation. 2) GCN training involves both memory-intensive data reduction and computation-intensive features/gradients update operations. Such a heterogeneous nature challenges current CPU/GPU platforms. 3) The irregularity of graphs and the complex training dataflow jointly increase the difficulty of improving a GCN training system's efficiency. This paper presents GCNear, a hybrid architecture to tackle these challenges. Specifically, GCNear adopts a DIMM-based memory system to provide easy-to-scale memory capacity. To match the heterogeneous nature, we categorize GCN training operations as memory-intensive Reduce and computation-intensive Update operations. We then offload Reduce operations to on-DIMM NMEs, making full use of the high aggregated local bandwidth. We adopt a CAE with sufficient computation capacity to process Update operations. We further propose several optimization strategies to deal with the irregularity of GCN tasks and improve GCNear's performance. We also propose a Multi-GCNear system to evaluate the scalability of GCNear.

Viaarxiv icon