Alert button
Picture for Chong Yu

Chong Yu

Alert button

Boosting Residual Networks with Group Knowledge

Aug 26, 2023
Shengji Tang, Peng Ye, Baopu Li, Weihao Lin, Tao Chen, Tong He, Chong Yu, Wanli Ouyang

Figure 1 for Boosting Residual Networks with Group Knowledge
Figure 2 for Boosting Residual Networks with Group Knowledge
Figure 3 for Boosting Residual Networks with Group Knowledge
Figure 4 for Boosting Residual Networks with Group Knowledge

Recent research understands the residual networks from a new perspective of the implicit ensemble model. From this view, previous methods such as stochastic depth and stimulative training have further improved the performance of the residual network by sampling and training of its subnets. However, they both use the same supervision for all subnets of different capacities and neglect the valuable knowledge generated by subnets during training. In this manuscript, we mitigate the significant knowledge distillation gap caused by using the same kind of supervision and advocate leveraging the subnets to provide diverse knowledge. Based on this motivation, we propose a group knowledge based training framework for boosting the performance of residual networks. Specifically, we implicitly divide all subnets into hierarchical groups by subnet-in-subnet sampling, aggregate the knowledge of different subnets in each group during training, and exploit upper-level group knowledge to supervise lower-level subnet groups. Meanwhile, We also develop a subnet sampling strategy that naturally samples larger subnets, which are found to be more helpful than smaller subnets in boosting performance for hierarchical groups. Compared with typical subnet training and other methods, our method achieves the best efficiency and performance trade-offs on multiple datasets and network structures. The code will be released soon.

* 15 pages 
Viaarxiv icon

Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend

May 18, 2023
Chong Yu, Tao Chen, Zhongxue Gan

Figure 1 for Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend
Figure 2 for Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend
Figure 3 for Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend
Figure 4 for Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend

Adversarial attack is commonly regarded as a huge threat to neural networks because of misleading behavior. This paper presents an opposite perspective: adversarial attacks can be harnessed to improve neural models if amended correctly. Unlike traditional adversarial defense or adversarial training schemes that aim to improve the adversarial robustness, the proposed adversarial amendment (AdvAmd) method aims to improve the original accuracy level of neural models on benign samples. We thoroughly analyze the distribution mismatch between the benign and adversarial samples. This distribution mismatch and the mutual learning mechanism with the same learning ratio applied in prior art defense strategies is the main cause leading the accuracy degradation for benign samples. The proposed AdvAmd is demonstrated to steadily heal the accuracy degradation and even leads to a certain accuracy boost of common neural models on benign classification, object detection, and segmentation tasks. The efficacy of the AdvAmd is contributed by three key components: mediate samples (to reduce the influence of distribution mismatch with a fine-grained amendment), auxiliary batch norm (to solve the mutual learning mechanism and the smoother judgment surface), and AdvAmd loss (to adjust the learning ratios according to different attack vulnerabilities) through quantitative and ablation experiments.

* Accepted to IJCAI 2023, 10 pages, 5 figures 
Viaarxiv icon

Boost Vision Transformer with GPU-Friendly Sparsity and Quantization

May 18, 2023
Chong Yu, Tao Chen, Zhongxue Gan, Jiayuan Fan

Figure 1 for Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Figure 2 for Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Figure 3 for Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Figure 4 for Boost Vision Transformer with GPU-Friendly Sparsity and Quantization

The transformer extends its success from the language to the vision domain. Because of the stacked self-attention and cross-attention blocks, the acceleration deployment of vision transformer on GPU hardware is challenging and also rarely studied. This paper thoroughly designs a compression scheme to maximally utilize the GPU-friendly 2:4 fine-grained structured sparsity and quantization. Specially, an original large model with dense weight parameters is first pruned into a sparse one by 2:4 structured pruning, which considers the GPU's acceleration of 2:4 structured sparse pattern with FP16 data type, then the floating-point sparse model is further quantized into a fixed-point one by sparse-distillation-aware quantization aware training, which considers GPU can provide an extra speedup of 2:4 sparse calculation with integer tensors. A mixed-strategy knowledge distillation is used during the pruning and quantization process. The proposed compression scheme is flexible to support supervised and unsupervised learning styles. Experiment results show GPUSQ-ViT scheme achieves state-of-the-art compression by reducing vision transformer models 6.4-12.7 times on model size and 30.3-62 times on FLOPs with negligible accuracy degradation on ImageNet classification, COCO detection and ADE20K segmentation benchmarking tasks. Moreover, GPUSQ-ViT can boost actual deployment performance by 1.39-1.79 times and 3.22-3.43 times of latency and throughput on A100 GPU, and 1.57-1.69 times and 2.11-2.51 times improvement of latency and throughput on AGX Orin.

* Accepted to CVPR 2023, 11 pages, 6 figures 
Viaarxiv icon

Accelerating Sparse Deep Neural Networks

Apr 16, 2021
Asit Mishra, Jorge Albericio Latorre, Jeff Pool, Darko Stosic, Dusan Stosic, Ganesh Venkatesh, Chong Yu, Paulius Micikevicius

Figure 1 for Accelerating Sparse Deep Neural Networks
Figure 2 for Accelerating Sparse Deep Neural Networks
Figure 3 for Accelerating Sparse Deep Neural Networks
Figure 4 for Accelerating Sparse Deep Neural Networks

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero values in parameters that can then be discarded from storage or computations. While most research focuses on high levels of sparsity, there are challenges in universally maintaining model accuracy as well as achieving significant speedups over modern matrix-math hardware. To make sparsity adoption practical, the NVIDIA Ampere GPU architecture introduces sparsity support in its matrix-math units, Tensor Cores. We present the design and behavior of Sparse Tensor Cores, which exploit a 2:4 (50%) sparsity pattern that leads to twice the math throughput of dense matrix units. We also describe a simple workflow for training networks that both satisfy 2:4 sparsity pattern requirements and maintain accuracy, verifying it on a wide range of common tasks and model architectures. This workflow makes it easy to prepare accurate models for efficient deployment on Sparse Tensor Cores.

Viaarxiv icon

Self-Supervised GAN Compression

Jul 12, 2020
Chong Yu, Jeff Pool

Figure 1 for Self-Supervised GAN Compression
Figure 2 for Self-Supervised GAN Compression
Figure 3 for Self-Supervised GAN Compression
Figure 4 for Self-Supervised GAN Compression

Deep learning's success has led to larger and larger models to handle more and more complex tasks; trained models can contain millions of parameters. These large models are compute- and memory-intensive, which makes it a challenge to deploy them with minimized latency, throughput, and storage requirements. Some model compression methods have been successfully applied to image classification and detection or language models, but there has been very little work compressing generative adversarial networks (GANs) performing complex tasks. In this paper, we show that a standard model compression technique, weight pruning, cannot be applied to GANs using existing methods. We then develop a self-supervised compression technique which uses the trained discriminator to supervise the training of a compressed generator. We show that this framework has a compelling performance to high degrees of sparsity, can be easily applied to new tasks and models, and enables meaningful comparisons between different pruning granularities.

* The appendix for this paper is in the following repository https://gitlab.com/dxxz/Self-Supervised-GAN-Compression-Appendix 
Viaarxiv icon