Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhi-Hong Deng

Are More Layers Beneficial to Graph Transformers?

Mar 01, 2023

Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei

Abstract:Despite that going deep has proven successful in many neural architectures, the existing graph transformers are relatively shallow. In this work, we explore whether more layers are beneficial to graph transformers, and find that current graph transformers suffer from the bottleneck of improving performance by increasing depth. Our further analysis reveals the reason is that deep graph transformers are limited by the vanishing capacity of global attention, restricting the graph transformer from focusing on the critical substructure and obtaining expressive features. To this end, we propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation, and applies local attention on related nodes to obtain substructure based attention encoding. Our model enhances the ability of the global attention to focus on substructures and promotes the expressiveness of the representations, addressing the limitation of self-attention as the graph transformer deepens. Experiments show that our method unblocks the depth limitation of graph transformers and results in state-of-the-art performance across various graph benchmarks with deeper models.

* ICLR 2023

Via

Access Paper or Ask Questions

Detachedly Learn a Classifier for Class-Incremental Learning

Feb 23, 2023

Ziheng Li, Shibo Jie, Zhi-Hong Deng

Abstract:In continual learning, model needs to continually learn a feature extractor and classifier on a sequence of tasks. This paper focuses on how to learn a classifier based on a pretrained feature extractor under continual learning setting. We present an probabilistic analysis that the failure of vanilla experience replay (ER) comes from unnecessary re-learning of previous tasks and incompetence to distinguish current task from the previous ones, which is the cause of knowledge degradation and prediction bias. To overcome these weaknesses, we propose a novel replay strategy task-aware experience replay. It rebalances the replay loss and detaches classifier weight for the old tasks from the update process, by which the previous knowledge is kept intact and the overfitting on episodic memory is alleviated. Experimental results show our method outperforms current state-of-the-art methods.

Via

Access Paper or Ask Questions

FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer

Dec 06, 2022

Shibo Jie, Zhi-Hong Deng

Abstract:Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating only a few parameters so as to improve storage efficiency, called parameter-efficient transfer learning (PETL). Current PETL methods have shown that by tuning only 0.5% of the parameters, ViT can be adapted to downstream tasks with even better performance than full fine-tuning. In this paper, we aim to further promote the efficiency of PETL to meet the extreme storage constraint in real-world applications. To this end, we propose a tensorization-decomposition framework to store the weight increments, in which the weights of each ViT are tensorized into a single 3D tensor, and their increments are then decomposed into lightweight factors. In the fine-tuning process, only the factors need to be updated and stored, termed Factor-Tuning (FacT). On VTAB-1K benchmark, our method performs on par with NOAH, the state-of-the-art PETL method, while being 5x more parameter-efficient. We also present a tiny version that only uses 8K (0.01% of ViT's parameters) trainable parameters but outperforms full fine-tuning and many other PETL methods such as VPT and BitFit. In few-shot settings, FacT also beats all PETL baselines using the fewest parameters, demonstrating its strong capability in the low-data regime.

* Accepted at AAAI 2023. Code: https://github.com/JieShibo/PETL-ViT

Via

Access Paper or Ask Questions

Convolutional Bypasses Are Better Vision Transformer Adapters

Jul 18, 2022

Shibo Jie, Zhi-Hong Deng

Figure 1 for Convolutional Bypasses Are Better Vision Transformer Adapters

Figure 2 for Convolutional Bypasses Are Better Vision Transformer Adapters

Figure 3 for Convolutional Bypasses Are Better Vision Transformer Adapters

Figure 4 for Convolutional Bypasses Are Better Vision Transformer Adapters

Abstract:The pretrain-then-finetune paradigm has been widely adopted in computer vision. But as the size of Vision Transformer (ViT) grows exponentially, the full finetuning becomes prohibitive in view of the heavier storage overhead. Motivated by parameter-efficient transfer learning (PETL) on language transformers, recent studies attempt to insert lightweight adaptation modules (e.g., adapter layers or prompt tokens) to pretrained ViT and only finetune these modules while the pretrained weights are frozen. However, these modules were originally proposed to finetune language models. Although ported well to ViT, their design lacks prior knowledge for visual tasks. In this paper, we propose to construct Convolutional Bypasses (Convpass) in ViT as adaptation modules, introducing only a small amount (less than 0.5% of model parameters) of trainable parameters to adapt the large ViT. Different from other PETL methods, Convpass benefits from the hard-coded inductive bias of convolutional layers and thus is more suitable for visual tasks, especially in the low-data regime. Experimental results on VTAB-1k benchmark and few-shot learning datasets demonstrate that Convpass outperforms current language-oriented adaptation modules, demonstrating the necessity to tailor vision-oriented adaptation modules for vision models.

Via

Access Paper or Ask Questions

Certified Robustness Against Natural Language Attacks by Causal Intervention

May 26, 2022

Haiteng Zhao, Chang Ma*, Xinshuai Dong, Anh Tuan Luu, Zhi-Hong Deng, Hanwang Zhang

Figure 1 for Certified Robustness Against Natural Language Attacks by Causal Intervention

Figure 2 for Certified Robustness Against Natural Language Attacks by Causal Intervention

Figure 3 for Certified Robustness Against Natural Language Attacks by Causal Intervention

Figure 4 for Certified Robustness Against Natural Language Attacks by Causal Intervention

Abstract:Deep learning models have achieved great success in many fields, yet they are vulnerable to adversarial examples. This paper follows a causal perspective to look into the adversarial vulnerability and proposes Causal Intervention by Semantic Smoothing (CISS), a novel framework towards robustness against natural language attacks. Instead of merely fitting observational data, CISS learns causal effects p(y|do(x)) by smoothing in the latent semantic space to make robust predictions, which scales to deep architectures and avoids tedious construction of noise customized for specific attacks. CISS is provably robust against word substitution attacks, as well as empirically robust even when perturbations are strengthened by unknown attack algorithms. For example, on YELP, CISS surpasses the runner-up by 6.7% in terms of certified robustness against word substitutions, and achieves 79.4% empirical robustness when syntactic attacks are integrated.

Via

Access Paper or Ask Questions

Bypassing Logits Bias in Online Class-Incremental Learning with a Generative Framework

May 19, 2022

Gehui Shen, Shibo Jie, Ziheng Li, Zhi-Hong Deng

Figure 1 for Bypassing Logits Bias in Online Class-Incremental Learning with a Generative Framework

Figure 2 for Bypassing Logits Bias in Online Class-Incremental Learning with a Generative Framework

Figure 3 for Bypassing Logits Bias in Online Class-Incremental Learning with a Generative Framework

Figure 4 for Bypassing Logits Bias in Online Class-Incremental Learning with a Generative Framework

Abstract:Continual learning requires the model to maintain the learned knowledge while learning from a non-i.i.d data stream continually. Due to the single-pass training setting, online continual learning is very challenging, but it is closer to the real-world scenarios where quick adaptation to new data is appealing. In this paper, we focus on online class-incremental learning setting in which new classes emerge over time. Almost all existing methods are replay-based with a softmax classifier. However, the inherent logits bias problem in the softmax classifier is a main cause of catastrophic forgetting while existing solutions are not applicable for online settings. To bypass this problem, we abandon the softmax classifier and propose a novel generative framework based on the feature space. In our framework, a generative classifier which utilizes replay memory is used for inference, and the training objective is a pair-based metric learning loss which is proven theoretically to optimize the feature space in a generative way. In order to improve the ability to learn new data, we further propose a hybrid of generative and discriminative loss to train the model. Extensive experiments on several benchmarks, including newly introduced task-free datasets, show that our method beats a series of state-of-the-art replay-based methods with discriminative classifiers, and reduces catastrophic forgetting consistently with a remarkable margin.

Via

Access Paper or Ask Questions

Alleviating Representational Shift for Continual Fine-tuning

Apr 22, 2022

Shibo Jie, Zhi-Hong Deng, Ziheng Li

Figure 1 for Alleviating Representational Shift for Continual Fine-tuning

Figure 2 for Alleviating Representational Shift for Continual Fine-tuning

Figure 3 for Alleviating Representational Shift for Continual Fine-tuning

Figure 4 for Alleviating Representational Shift for Continual Fine-tuning

Abstract:We study a practical setting of continual learning: fine-tuning on a pre-trained model continually. Previous work has found that, when training on new tasks, the features (penultimate layer representations) of previous data will change, called representational shift. Besides the shift of features, we reveal that the intermediate layers' representational shift (IRS) also matters since it disrupts batch normalization, which is another crucial cause of catastrophic forgetting. Motivated by this, we propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning. Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS. Hierarchical fine-tuning leverages a multi-stage strategy to fine-tune the pre-trained network, preventing massive changes in Conv layers and thus alleviating IRS. Experimental results on four datasets show that our method remarkably outperforms several state-of-the-art methods with lower storage overhead.

Via

Access Paper or Ask Questions

Rethinking Minimal Sufficient Representation in Contrastive Learning

Apr 02, 2022

Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu

Figure 1 for Rethinking Minimal Sufficient Representation in Contrastive Learning

Figure 2 for Rethinking Minimal Sufficient Representation in Contrastive Learning

Figure 3 for Rethinking Minimal Sufficient Representation in Contrastive Learning

Figure 4 for Rethinking Minimal Sufficient Representation in Contrastive Learning

Abstract:Contrastive learning between different views of the data achieves outstanding success in the field of self-supervised representation learning and the learned representations are useful in broad downstream tasks. Since all supervision information for one view comes from the other view, contrastive learning approximately obtains the minimal sufficient representation which contains the shared information and eliminates the non-shared information between views. Considering the diversity of the downstream tasks, it cannot be guaranteed that all task-relevant information is shared between views. Therefore, we assume the non-shared task-relevant information cannot be ignored and theoretically prove that the minimal sufficient representation in contrastive learning is not sufficient for the downstream tasks, which causes performance degradation. This reveals a new problem that the contrastive learning models have the risk of over-fitting to the shared information between views. To alleviate this problem, we propose to increase the mutual information between the representation and input as regularization to approximately introduce more task-relevant information, since we cannot utilize any downstream task information during training. Extensive experiments verify the rationality of our analysis and the effectiveness of our method. It significantly improves the performance of several classic contrastive learning models in downstream tasks. Our code is available at https://github.com/Haoqing-Wang/InfoCL.

* Accepted by CVPR 2022 as Oral presentation

Via

Access Paper or Ask Questions

G$^3$SR: Global Graph Guided Session-based Recommendation

Mar 12, 2022

Zhi-Hong Deng, Chang-Dong Wang, Ling Huang, Jian-Huang Lai, Philip S. Yu

Figure 1 for G$^3$SR: Global Graph Guided Session-based Recommendation

Figure 2 for G$^3$SR: Global Graph Guided Session-based Recommendation

Figure 3 for G$^3$SR: Global Graph Guided Session-based Recommendation

Figure 4 for G$^3$SR: Global Graph Guided Session-based Recommendation

Abstract:Session-based recommendation tries to make use of anonymous session data to deliver high-quality recommendation under the condition that user-profiles and the complete historical behavioral data of a target user are unavailable. Previous works consider each session individually and try to capture user interests within a session. Despite their encouraging results, these models can only perceive intra-session items and cannot draw upon the massive historical relational information. To solve this problem, we propose a novel method named G$^3$SR (Global Graph Guided Session-based Recommendation). G$^3$SR decomposes the session-based recommendation workflow into two steps. First, a global graph is built upon all session data, from which the global item representations are learned in an unsupervised manner. Then, these representations are refined on session graphs under the graph networks, and a readout function is used to generate session representations for each session. Extensive experiments on two real-world benchmark datasets show remarkable and consistent improvements of the G$^3$SR method over the state-of-the-art methods, especially for cold items.

Via

Access Paper or Ask Questions

Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

May 02, 2021

Haoqing Wang, Zhi-Hong Deng

Figure 1 for Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

Figure 2 for Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

Figure 3 for Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

Figure 4 for Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

Abstract:Few-shot classification aims to recognize unseen classes with few labeled samples from each class. Many meta-learning models for few-shot classification elaborately design various task-shared inductive bias (meta-knowledge) to solve such tasks, and achieve impressive performance. However, when there exists the domain shift between the training tasks and the test tasks, the obtained inductive bias fails to generalize across domains, which degrades the performance of the meta-learning models. In this work, we aim to improve the robustness of the inductive bias through task augmentation. Concretely, we consider the worst-case problem around the source task distribution, and propose the adversarial task augmentation method which can generate the inductive bias-adaptive 'challenging' tasks. Our method can be used as a simple plug-and-play module for various meta-learning models, and improve their cross-domain generalization capability. We conduct extensive experiments under the cross-domain setting, using nine few-shot classification datasets: mini-ImageNet, CUB, Cars, Places, Plantae, CropDiseases, EuroSAT, ISIC and ChestX. Experimental results show that our method can effectively improve the few-shot classification performance of the meta-learning models under domain shift, and outperforms the existing works. Our code is available at https://github.com/Haoqing-Wang/CDFSL-ATA.

* Accepted by IJCAI-21 (the 30th International Joint Conference on Artificial Intelligence) Main Track

Via

Access Paper or Ask Questions