Alert button
Picture for Chongzhi Zhang

Chongzhi Zhang

Alert button

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts

Jun 18, 2021
Chongzhi Zhang, Mingyuan Zhang, Shanghang Zhang, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Shuai Yi, Xianglong Liu, Ziwei Liu

Figure 1 for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts
Figure 2 for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts
Figure 3 for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts
Figure 4 for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts

Recently, Vision Transformers (ViTs) have achieved impressive results on various vision tasks. Yet, their generalization ability under different distribution shifts is rarely understood. In this work, we provide a comprehensive study on the out-of-distribution generalization of ViTs. To support a systematic investigation, we first present a taxonomy of distribution shifts by categorizing them into five conceptual groups: corruption shift, background shift, texture shift, destruction shift, and style shift. Then we perform extensive evaluations of ViT variants under different groups of distribution shifts and compare their generalization ability with CNNs. Several important observations are obtained: 1) ViTs generalize better than CNNs under multiple distribution shifts. With the same or fewer parameters, ViTs are ahead of corresponding CNNs by more than 5% in top-1 accuracy under most distribution shifts. 2) Larger ViTs gradually narrow the in-distribution and out-of-distribution performance gap. To further improve the generalization of ViTs, we design the Generalization-Enhanced ViTs by integrating adversarial learning, information theory, and self-supervised learning. By investigating three types of generalization-enhanced ViTs, we observe their gradient-sensitivity and design a smoother learning strategy to achieve a stable training process. With modified training schemes, we achieve improvements on performance towards out-of-distribution data by 4% from vanilla ViTs. We comprehensively compare three generalization-enhanced ViTs with their corresponding CNNs, and observe that: 1) For the enhanced model, larger ViTs still benefit more for the out-of-distribution generalization. 2) generalization-enhanced ViTs are more sensitive to the hyper-parameters than corresponding CNNs. We hope our comprehensive study could shed light on the design of more generalizable learning architectures.

* Our code is available at https://github.com/Phoenix1153/ViT_OOD_generalization 
Viaarxiv icon

Towards Overcoming False Positives in Visual Relationship Detection

Dec 24, 2020
Daisheng Jin, Xiao Ma, Chongzhi Zhang, Yizhuo Zhou, Jiashu Tao, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Zhoujun Li, Xianglong Liu, Hongsheng Li

Figure 1 for Towards Overcoming False Positives in Visual Relationship Detection
Figure 2 for Towards Overcoming False Positives in Visual Relationship Detection
Figure 3 for Towards Overcoming False Positives in Visual Relationship Detection
Figure 4 for Towards Overcoming False Positives in Visual Relationship Detection

In this paper, we investigate the cause of the high false positive rate in Visual Relationship Detection (VRD). We observe that during training, the relationship proposal distribution is highly imbalanced: most of the negative relationship proposals are easy to identify, e.g., the inaccurate object detection, which leads to the under-fitting of low-frequency difficult proposals. This paper presents Spatially-Aware Balanced negative pRoposal sAmpling (SABRA), a robust VRD framework that alleviates the influence of false positives. To effectively optimize the model under imbalanced distribution, SABRA adopts Balanced Negative Proposal Sampling (BNPS) strategy for mini-batch sampling. BNPS divides proposals into 5 well defined sub-classes and generates a balanced training distribution according to the inverse frequency. BNPS gives an easier optimization landscape and significantly reduces the number of false positives. To further resolve the low-frequency challenging false positive proposals with high spatial ambiguity, we improve the spatial modeling ability of SABRA on two aspects: a simple and efficient multi-head heterogeneous graph attention network (MH-GAT) that models the global spatial interactions of objects, and a spatial mask decoder that learns the local spatial configuration. SABRA outperforms SOTA methods by a large margin on two human-object interaction (HOI) datasets and one general VRD dataset.

* 13 pages, 5 figures 
Viaarxiv icon

Patch Attack for Automatic Check-out

May 19, 2020
Aishan Liu, Jiakai Wang, Xianglong Liu, Chongzhi Zhang, Bowen Cao, Hang Yu

Figure 1 for Patch Attack for Automatic Check-out
Figure 2 for Patch Attack for Automatic Check-out
Figure 3 for Patch Attack for Automatic Check-out
Figure 4 for Patch Attack for Automatic Check-out

Adversarial examples are inputs with imperceptible perturbations that easily misleading deep neural networks(DNNs). Recently, adversarial patch, with noise confined to a small and localized patch, has emerged for its easy feasibility in real-world scenarios. However, existing strategies failed to generate adversarial patches with strong generalization ability. In other words, the adversarial patches were input-specific and failed to attack images from all classes, especially unseen ones during training. To address the problem, this paper proposes a bias-based framework to generate class-agnostic universal adversarial patches with strong generalization ability, which exploits both the perceptual and semantic bias of models. Regarding the perceptual bias, since DNNs are strongly biased towards textures, we exploit the hard examples which convey strong model uncertainties and extract a textural patch prior from them by adopting the style similarities. The patch prior is more close to decision boundaries and would promote attacks. To further alleviate the heavy dependency on large amounts of data in training universal attacks, we further exploit the semantic bias. As the class-wise preference, prototypes are introduced and pursued by maximizing the multi-class margin to help universal training. Taking AutomaticCheck-out (ACO) as the typical scenario, extensive experiments including white-box and black-box settings in both digital-world(RPC, the largest ACO related dataset) and physical-world scenario(Taobao and JD, the world' s largest online shopping platforms) are conducted. Experimental results demonstrate that our proposed framework outperforms state-of-the-art adversarial patch attack methods.

Viaarxiv icon

Training Robust Deep Neural Networks via Adversarial Noise Propagation

Sep 19, 2019
Aishan Liu, Xianglong Liu, Chongzhi Zhang, Hang Yu, Qiang Liu, Junfeng He

Figure 1 for Training Robust Deep Neural Networks via Adversarial Noise Propagation
Figure 2 for Training Robust Deep Neural Networks via Adversarial Noise Propagation
Figure 3 for Training Robust Deep Neural Networks via Adversarial Noise Propagation
Figure 4 for Training Robust Deep Neural Networks via Adversarial Noise Propagation

Deep neural networks have been found vulnerable to noises like adversarial examples and corruption in practice. A number of adversarial defense methods have been developed, which indeed improve the model robustness towards adversarial examples in practice. However, only relying on training with the data mixed with noises, most of them still fail to defend the generalized types of noises. Motivated by the fact that hidden layers play a very important role in maintaining a robust model, this paper comes up with a simple yet powerful training algorithm named Adversarial Noise Propagation (ANP) that injects diversified noises into the hidden layers in a layer-wise manner. We show that ANP can be efficiently implemented by exploiting the nature of the popular backward-forward training style for deep models. To comprehensively understand the behaviors and contributions of hidden layers, we further explore the insights from hidden representation insensitivity and human vision perception alignment. Extensive experiments on MNIST, CIFAR-10, CIFAR-10-C, CIFAR-10-P and ImageNet demonstrate that ANP enables the strong robustness for deep models against the generalized noises including both adversarial and corrupted ones, and significantly outperforms various adversarial defense methods.

* 14 pages 
Viaarxiv icon

Towards Noise-Robust Neural Networks via Progressive Adversarial Training

Sep 17, 2019
Hang Yu, Aishan Liu, Xianglong Liu, Jichen Yang, Chongzhi Zhang

Figure 1 for Towards Noise-Robust Neural Networks via Progressive Adversarial Training
Figure 2 for Towards Noise-Robust Neural Networks via Progressive Adversarial Training
Figure 3 for Towards Noise-Robust Neural Networks via Progressive Adversarial Training
Figure 4 for Towards Noise-Robust Neural Networks via Progressive Adversarial Training

Adversarial examples, intentionally designed inputs tending to mislead deep neural networks, have attracted great attention in the past few years. Although a series of defense strategies have been developed and achieved encouraging model robustness, most of them are still vulnerable to the more commonly witnessed corruptions, e.g., Gaussian noise, blur, etc., in the real world. In this paper, we theoretically and empirically discover the fact that there exists an inherent connection between adversarial robustness and corruption robustness. Based on the fundamental discovery, this paper further proposes a more powerful training method named Progressive Adversarial Training (PAT) that adds diversified adversarial noises progressively during training, and thus obtains robust model against both adversarial examples and corruptions through higher training data complexity. Meanwhile, we also theoretically find that PAT can promise better generalization ability. Experimental evaluation on MNIST, CIFAR-10 and SVHN show that PAT is able to enhance the robustness and generalization of the state-of-the-art network structures, performing comprehensively well compared to various augmentation methods. Moreover, we also propose Mixed Test to evaluate model generalization ability more fairly.

Viaarxiv icon

Interpreting and Improving Adversarial Robustness with Neuron Sensitivity

Sep 16, 2019
Chongzhi Zhang, Aishan Liu, Xianglong Liu, Yitao Xu, Hang Yu, Yuqing Ma, Tianlin Li

Figure 1 for Interpreting and Improving Adversarial Robustness with Neuron Sensitivity
Figure 2 for Interpreting and Improving Adversarial Robustness with Neuron Sensitivity
Figure 3 for Interpreting and Improving Adversarial Robustness with Neuron Sensitivity
Figure 4 for Interpreting and Improving Adversarial Robustness with Neuron Sensitivity

Deep neural networks (DNNs) are vulnerable to adversarial examples where inputs with imperceptible perturbations mislead DNNs to incorrect results. Despite the potential risk they bring, adversarial examples are also valuable for providing insights into the weakness and blind-spots of DNNs. Thus, the interpretability of a DNN in adversarial setting aims to explain the rationale behind its decision-making process and makes deeper understanding which results in better practical applications. To address this issue, we try to explain adversarial robustness for deep models from a new perspective of neuron sensitivity which is measured by neuron behavior variation intensity against benign and adversarial examples. In this paper, we first draw the close connection between adversarial robustness and neuron sensitivities, as sensitive neurons make the most non-trivial contributions to model predictions in adversarial setting. Based on that, we further propose to improve adversarial robustness by constraining the similarities of sensitive neurons between benign and adversarial examples which stabilizes the behaviors of sensitive neurons in adversarial setting. Moreover, we demonstrate that state-of-the-art adversarial training methods improve model robustness by reducing neuron sensitivities which in turn confirms the strong connections between adversarial robustness and neuron sensitivity as well as the effectiveness of using sensitive neurons to build robust models. Extensive experiments on various datasets demonstrate that our algorithm effectively achieve excellent results.

Viaarxiv icon