Alert button
Picture for Gengwei Zhang

Gengwei Zhang

Alert button

SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model

Mar 09, 2023
Gengwei Zhang, Liyuan Wang, Guoliang Kang, Ling Chen, Yunchao Wei

Figure 1 for SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model
Figure 2 for SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model
Figure 3 for SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model
Figure 4 for SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model

The goal of continual learning is to improve the performance of recognition models in learning sequentially arrived data. Although most existing works are established on the premise of learning from scratch, growing efforts have been devoted to incorporating the benefits of pre-training. However, how to adaptively exploit the pre-trained knowledge for each incremental task while maintaining its generalizability remains an open question. In this work, we present an extensive analysis for continual learning on a pre-trained model (CLPM), and attribute the key challenge to a progressive overfitting problem. Observing that selectively reducing the learning rate can almost resolve this issue in the representation layer, we propose a simple but extremely effective approach named Slow Learner with Classifier Alignment (SLCA), which further improves the classification layer by modeling the class-wise distributions and aligning the classification layers in a post-hoc fashion. Across a variety of scenarios, our proposal provides substantial improvements for CLPM (e.g., up to 49.76%, 50.05%, 44.69% and 40.16% on Split CIFAR-100, Split ImageNet-R, Split CUB-200 and Split Cars-196, respectively), and thus outperforms state-of-the-art approaches by a large margin. Based on such a strong baseline, critical factors and promising directions are analyzed in-depth to facilitate subsequent research.

* Tech report. 11 pages, 8 figures 
Viaarxiv icon

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

May 06, 2022
Binbin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, Xiaodan Liang

Figure 1 for Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
Figure 2 for Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
Figure 3 for Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
Figure 4 for Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

Continual learning is a challenging real-world problem for constructing a mature AI system when data are provided in a streaming fashion. Despite recent progress in continual classification, the researches of continual object detection are impeded by the diverse sizes and numbers of objects in each image. Different from previous works that tune the whole network for all tasks, in this work, we present a simple and flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTing mechAnism (ROSETTA). Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks. In this way, various knowledge can be successively memorized by storing their corresponding sub-model weights in this system. To make ROSETTA automatically determine which experience is available and useful, a prototypical task correlation guided Gating Diversity Controller(GDC) is introduced to adaptively adjust the diversity of gates for the new task based on class-specific prototypes. GDC module computes class-to-class correlation matrix to depict the cross-task correlation, and hereby activates more exclusive gates for the new task if a significant domain gap is observed. Comprehensive experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance on both task-based and class-based continual object detection.

Viaarxiv icon

NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models

Aug 07, 2021
Hang Xu, Ning Kang, Gengwei Zhang, Chuanlong Xie, Xiaodan Liang, Zhenguo Li

Figure 1 for NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models
Figure 2 for NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models
Figure 3 for NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models
Figure 4 for NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models

Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks. The common practice of fine-tuning is to adopt a default hyperparameter setting with a fixed pre-trained model, while both of them are not optimized for specific tasks and time constraints. Moreover, in cloud computing or GPU clusters where the tasks arrive sequentially in a stream, faster online fine-tuning is a more desired and realistic strategy for saving money, energy consumption, and CO2 emission. In this paper, we propose a joint Neural Architecture Search and Online Adaption framework named NASOA towards a faster task-oriented fine-tuning upon the request of users. Specifically, NASOA first adopts an offline NAS to identify a group of training-efficient networks to form a pretrained model zoo. We propose a novel joint block and macro-level search space to enable a flexible and efficient search. Then, by estimating fine-tuning performance via an adaptive model by accumulating experience from the past tasks, an online schedule generator is proposed to pick up the most suitable model and generate a personalized training regime with respect to each desired task in a one-shot fashion. The resulting model zoo is more training efficient than SOTA models, e.g. 6x faster than RegNetY-16GF, and 1.7x faster than EfficientNetB3. Experiments on multiple datasets also show that NASOA achieves much better fine-tuning results, i.e. improving around 2.1% accuracy than the best performance in RegNet series under various constraints and tasks; 40x faster compared to the BOHB.

* Accepted by ICCV 2021 
Viaarxiv icon

Few-Shot Segmentation via Cycle-Consistent Transformer

Jun 04, 2021
Gengwei Zhang, Guoliang Kang, Yunchao Wei, Yi Yang

Figure 1 for Few-Shot Segmentation via Cycle-Consistent Transformer
Figure 2 for Few-Shot Segmentation via Cycle-Consistent Transformer
Figure 3 for Few-Shot Segmentation via Cycle-Consistent Transformer
Figure 4 for Few-Shot Segmentation via Cycle-Consistent Transformer

Few-shot segmentation aims to train a segmentation model that can fast adapt to novel classes with few exemplars. The conventional training paradigm is to learn to make predictions on query images conditioned on the features from support images. Previous methods only utilized the semantic-level prototypes of support images as the conditional information. These methods cannot utilize all pixel-wise support information for the query predictions, which is however critical for the segmentation task. In this paper, we focus on utilizing pixel-wise relationships between support and target images to facilitate the few-shot semantic segmentation task. We design a novel Cycle-Consistent Transformer (CyCTR) module to aggregate pixel-wise support features into query ones. CyCTR performs cross-attention between features from different images, i.e. support and query images. We observe that there may exist unexpected irrelevant pixel-level support features. Directly performing cross-attention may aggregate these features from support to query and bias the query features. Thus, we propose using a novel cycle-consistent attention mechanism to filter out possible harmful support features and encourage query features to attend to the most informative pixels from support images. Experiments on all few-shot segmentation benchmarks demonstrate that our proposed CyCTR leads to remarkable improvement compared to previous state-of-the-art methods. Specifically, on Pascal-$5^i$ and COCO-$20^i$ datasets, we achieve 66.6% and 45.6% mIoU for 5-shot segmentation, outperforming previous state-of-the-art by 4.6% and 7.1% respectively.

Viaarxiv icon

Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search

Feb 09, 2021
Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, Zhenguo Li

Figure 1 for Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search
Figure 2 for Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search
Figure 3 for Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search
Figure 4 for Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search

Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models. For object detection, the well-established classification and regression loss functions have been carefully designed by considering diverse learning challenges. Inspired by the recent progress in network architecture search, it is interesting to explore the possibility of discovering new loss function formulations via directly searching the primitive operation combinations. So that the learned losses not only fit for diverse object detection challenges to alleviate huge human efforts, but also have better alignment with evaluation metric and good mathematical convergence property. Beyond the previous auto-loss works on face recognition and image classification, our work makes the first attempt to discover new loss functions for the challenging object detection from primitive operation levels. We propose an effective convergence-simulation driven evolutionary search algorithm, called CSE-Autoloss, for speeding up the search progress by regularizing the mathematical rationality of loss candidates via convergence property verification and model optimization simulation. CSE-Autoloss involves the search space that cover a wide range of the possible variants of existing losses and discovers best-searched loss function combination within a short time (around 1.5 wall-clock days). We conduct extensive evaluations of loss function search on popular detectors and validate the good generalization capability of searched losses across diverse architectures and datasets. Our experiments show that the best-discovered loss function combinations outperform default combinations by 1.1% and 0.8% in terms of mAP for two-stage and one-stage detectors on COCO respectively. Our searched losses are available at https://github.com/PerdonLiu/CSE-Autoloss.

* Accepted by ICLR2021 Poster 
Viaarxiv icon

Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation

Dec 07, 2020
Gengwei Zhang, Yiming Gao, Hang Xu, Hao Zhang, Zhenguo Li, Xiaodan Liang

Figure 1 for Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation
Figure 2 for Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation
Figure 3 for Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation
Figure 4 for Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation

Panoptic segmentation that unifies instance segmentation and semantic segmentation has recently attracted increasing attention. While most existing methods focus on designing novel architectures, we steer toward a different perspective: performing automated multi-loss adaptation (named Ada-Segment) on the fly to flexibly adjust multiple training losses over the course of training using a controller trained to capture the learning dynamics. This offers a few advantages: it bypasses manual tuning of the sensitive loss combination, a decisive factor for panoptic segmentation; it allows to explicitly model the learning dynamics, and reconcile the learning of multiple objectives (up to ten in our experiments); with an end-to-end architecture, it generalizes to different datasets without the need of re-tuning hyperparameters or re-adjusting the training process laboriously. Our Ada-Segment brings 2.7% panoptic quality (PQ) improvement on COCO val split from the vanilla baseline, achieving the state-of-the-art 48.5% PQ on COCO test-dev split and 32.9% PQ on ADE20K dataset. The extensive ablation studies reveal the ever-changing dynamics throughout the training process, necessitating the incorporation of an automated and adaptive learning strategy as presented in this paper.

* Accepted by AAAI2021 
Viaarxiv icon

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

Oct 30, 2020
Yangxin Wu, Gengwei Zhang, Hang Xu, Xiaodan Liang, Liang Lin

Figure 1 for Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation
Figure 2 for Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation
Figure 3 for Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation
Figure 4 for Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

Panoptic segmentation is posed as a new popular test-bed for the state-of-the-art holistic scene understanding methods with the requirement of simultaneously segmenting both foreground things and background stuff. The state-of-the-art panoptic segmentation network exhibits high structural complexity in different network components, i.e. backbone, proposal-based foreground branch, segmentation-based background branch, and feature fusion module across branches, which heavily relies on expert knowledge and tedious trials. In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm. Notably, we extend the common single-task NAS into the multi-component scenario by taking the advantage of the newly proposed intra-modular search space and problem-oriented inter-modular search space, which helps us to obtain an optimal network architecture that not only performs well in both instance segmentation and semantic segmentation tasks but also be aware of the reciprocal relations between foreground things and background stuff classes. To relieve the vast computation burden incurred by applying NAS to complicated network architectures, we present a novel path-priority greedy search policy to find a robust, transferrable architecture with significantly reduced searching overhead. Our searched architecture, namely Auto-Panoptic, achieves the new state-of-the-art on the challenging COCO and ADE20K benchmarks. Moreover, extensive experiments are conducted to demonstrate the effectiveness of path-priority policy and transferability of Auto-Panoptic across different datasets. Codes and models are available at: https://github.com/Jacobew/AutoPanoptic.

* NeurIPS2020 
Viaarxiv icon

Bidirectional Graph Reasoning Network for Panoptic Segmentation

Apr 14, 2020
Yangxin Wu, Gengwei Zhang, Yiming Gao, Xiajun Deng, Ke Gong, Xiaodan Liang, Liang Lin

Figure 1 for Bidirectional Graph Reasoning Network for Panoptic Segmentation
Figure 2 for Bidirectional Graph Reasoning Network for Panoptic Segmentation
Figure 3 for Bidirectional Graph Reasoning Network for Panoptic Segmentation
Figure 4 for Bidirectional Graph Reasoning Network for Panoptic Segmentation

Recent researches on panoptic segmentation resort to a single end-to-end network to combine the tasks of instance segmentation and semantic segmentation. However, prior models only unified the two related tasks at the architectural level via a multi-branch scheme or revealed the underlying correlation between them by unidirectional feature fusion, which disregards the explicit semantic and co-occurrence relations among objects and background. Inspired by the fact that context information is critical to recognize and localize the objects, and inclusive object details are significant to parse the background scene, we thus investigate on explicitly modeling the correlations between object and background to achieve a holistic understanding of an image in the panoptic segmentation task. We introduce a Bidirectional Graph Reasoning Network (BGRNet), which incorporates graph structure into the conventional panoptic segmentation network to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes. In particular, BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level, respectively. To establish the correlations between separate branches and fully leverage the complementary relations between things and stuff, we propose a Bidirectional Graph Connection Module to diffuse information across branches in a learnable fashion. Experimental results demonstrate the superiority of our BGRNet that achieves the new state-of-the-art performance on challenging COCO and ADE20K panoptic segmentation benchmarks.

* CVPR2020 
Viaarxiv icon