Alert button
Picture for Shangling Jui

Shangling Jui

Alert button

Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering

Sep 01, 2023
Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui, Jian Yang

Domain adaptation (DA) aims to alleviate the domain shift between source domain and target domain. Most DA methods require access to the source data, but often that is not possible (e.g. due to data privacy or intellectual property). In this paper, we address the challenging source-free domain adaptation (SFDA) problem, where the source pretrained model is adapted to the target domain in the absence of source data. Our method is based on the observation that target data, which might not align with the source domain classifier, still forms clear clusters. We capture this intrinsic structure by defining local affinity of the target data, and encourage label consistency among data with high local affinity. We observe that higher affinity should be assigned to reciprocal neighbors. To aggregate information with more context, we consider expanded neighborhoods with small affinity values. Furthermore, we consider the density around each target sample, which can alleviate the negative impact of potential outliers. In the experimental results we verify that the inherent structure of the target features is an important source of information for domain adaptation. We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood. Finally, we achieve state-of-the-art performance on several 2D image and 3D point cloud recognition datasets.

* Accepted by IEEE TPAMI, extended version of conference paper arXiv:2110.04202 
Viaarxiv icon

Ternary Singular Value Decomposition as a Better Parameterized Form in Linear Mapping

Aug 15, 2023
Boyu Chen, Hanxuan Chen, Jiao He, Fengyu Sun, Shangling Jui

We present a simple yet novel parameterized form of linear mapping to achieves remarkable network compression performance: a pseudo SVD called Ternary SVD (TSVD). Unlike vanilla SVD, TSVD limits the $U$ and $V$ matrices in SVD to ternary matrices form in $\{\pm 1, 0\}$. This means that instead of using the expensive multiplication instructions, TSVD only requires addition instructions when computing $U(\cdot)$ and $V(\cdot)$. We provide direct and training transition algorithms for TSVD like Post Training Quantization and Quantization Aware Training respectively. Additionally, we analyze the convergence of the direct transition algorithms in theory. In experiments, we demonstrate that TSVD can achieve state-of-the-art network compression performance in various types of networks and tasks, including current baseline models such as ConvNext, Swim, BERT, and large language model like OPT.

Viaarxiv icon

Reparameterization through Spatial Gradient Scaling

Mar 07, 2023
Alexander Detkov, Mohammad Salameh, Muhammad Fetrat Qharabagh, Jialin Zhang, Wei Lui, Shangling Jui, Di Niu

Figure 1 for Reparameterization through Spatial Gradient Scaling
Figure 2 for Reparameterization through Spatial Gradient Scaling
Figure 3 for Reparameterization through Spatial Gradient Scaling
Figure 4 for Reparameterization through Spatial Gradient Scaling

Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training. However, there exists a gap in understanding how reparameterization may change and benefit the learning process of neural networks. In this paper, we present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks. We prove that spatial gradient scaling achieves the same learning dynamics as a branched reparameterization yet without introducing structural changes into the network. We further propose an analytical approach that dynamically learns scalings for each convolutional layer based on the spatial characteristics of its input feature map gauged by mutual information. Experiments on CIFAR-10, CIFAR-100, and ImageNet show that without searching for reparameterized structures, our proposed scaling method outperforms the state-of-the-art reparameterization strategies at a lower computational cost.

* Published at ICLR 2023. Code available at https://github.com/Ascend-Research/Reparameterization 
Viaarxiv icon

A General-Purpose Transferable Predictor for Neural Architecture Search

Feb 21, 2023
Fred X. Han, Keith G. Mills, Fabian Chudak, Parsa Riahi, Mohammad Salameh, Jialin Zhang, Wei Lu, Shangling Jui, Di Niu

Figure 1 for A General-Purpose Transferable Predictor for Neural Architecture Search
Figure 2 for A General-Purpose Transferable Predictor for Neural Architecture Search
Figure 3 for A General-Purpose Transferable Predictor for Neural Architecture Search
Figure 4 for A General-Purpose Transferable Predictor for Neural Architecture Search

Understanding and modelling the performance of neural architectures is key to Neural Architecture Search (NAS). Performance predictors have seen widespread use in low-cost NAS and achieve high ranking correlations between predicted and ground truth performance in several NAS benchmarks. However, existing predictors are often designed based on network encodings specific to a predefined search space and are therefore not generalizable to other search spaces or new architecture families. In this paper, we propose a general-purpose neural predictor for NAS that can transfer across search spaces, by representing any given candidate Convolutional Neural Network (CNN) with a Computation Graph (CG) that consists of primitive operators. We further combine our CG network representation with Contrastive Learning (CL) and propose a graph representation learning procedure that leverages the structural information of unlabeled architectures from multiple families to train CG embeddings for our performance predictor. Experimental results on NAS-Bench-101, 201 and 301 demonstrate the efficacy of our scheme as we achieve strong positive Spearman Rank Correlation Coefficient (SRCC) on every search space, outperforming several Zero-Cost Proxies, including Synflow and Jacov, which are also generalizable predictors across search spaces. Moreover, when using our proposed general-purpose predictor in an evolutionary neural architecture search algorithm, we can find high-performance architectures on NAS-Bench-101 and find a MobileNetV3 architecture that attains 79.2% top-1 accuracy on ImageNet.

* Accepted to SDM2023; version includes supplementary material; 12 Pages, 3 Figures, 6 Tables 
Viaarxiv icon

AIO-P: Expanding Neural Performance Predictors Beyond Image Classification

Nov 30, 2022
Keith G. Mills, Di Niu, Mohammad Salameh, Weichen Qiu, Fred X. Han, Puyuan Liu, Jialin Zhang, Wei Lu, Shangling Jui

Figure 1 for AIO-P: Expanding Neural Performance Predictors Beyond Image Classification
Figure 2 for AIO-P: Expanding Neural Performance Predictors Beyond Image Classification
Figure 3 for AIO-P: Expanding Neural Performance Predictors Beyond Image Classification
Figure 4 for AIO-P: Expanding Neural Performance Predictors Beyond Image Classification

Evaluating neural network performance is critical to deep neural network design but a costly procedure. Neural predictors provide an efficient solution by treating architectures as samples and learning to estimate their performance on a given task. However, existing predictors are task-dependent, predominantly estimating neural network performance on image classification benchmarks. They are also search-space dependent; each predictor is designed to make predictions for a specific architecture search space with predefined topologies and set of operations. In this paper, we propose a novel All-in-One Predictor (AIO-P), which aims to pretrain neural predictors on architecture examples from multiple, separate computer vision (CV) task domains and multiple architecture spaces, and then transfer to unseen downstream CV tasks or neural architectures. We describe our proposed techniques for general graph representation, efficient predictor pretraining and knowledge infusion techniques, as well as methods to transfer to downstream tasks/spaces. Extensive experimental results show that AIO-P can achieve Mean Absolute Error (MAE) and Spearman's Rank Correlation (SRCC) below 1% and above 0.5, respectively, on a breadth of target downstream CV tasks with or without fine-tuning, outperforming a number of baselines. Moreover, AIO-P can directly transfer to new architectures not seen during training, accurately rank them and serve as an effective performance estimator when paired with an algorithm designed to preserve performance while reducing FLOPs.

* Accepted at AAAI 2023; version includes supplementary material; 16 Pages, 4 Figures, 22 Tables 
Viaarxiv icon

GENNAPE: Towards Generalized Neural Architecture Performance Estimators

Nov 30, 2022
Keith G. Mills, Fred X. Han, Jialin Zhang, Fabian Chudak, Ali Safari Mamaghani, Mohammad Salameh, Wei Lu, Shangling Jui, Di Niu

Figure 1 for GENNAPE: Towards Generalized Neural Architecture Performance Estimators
Figure 2 for GENNAPE: Towards Generalized Neural Architecture Performance Estimators
Figure 3 for GENNAPE: Towards Generalized Neural Architecture Performance Estimators
Figure 4 for GENNAPE: Towards Generalized Neural Architecture Performance Estimators

Predicting neural architecture performance is a challenging task and is crucial to neural architecture design and search. Existing approaches either rely on neural performance predictors which are limited to modeling architectures in a predefined design space involving specific sets of operators and connection rules, and cannot generalize to unseen architectures, or resort to zero-cost proxies which are not always accurate. In this paper, we propose GENNAPE, a Generalized Neural Architecture Performance Estimator, which is pretrained on open neural architecture benchmarks, and aims to generalize to completely unseen architectures through combined innovations in network representation, contrastive pretraining, and fuzzy clustering-based predictor ensemble. Specifically, GENNAPE represents a given neural network as a Computation Graph (CG) of atomic operations which can model an arbitrary architecture. It first learns a graph encoder via Contrastive Learning to encourage network separation by topological features, and then trains multiple predictor heads, which are soft-aggregated according to the fuzzy membership of a neural network. Experiments show that GENNAPE pretrained on NAS-Bench-101 can achieve superior transferability to 5 different public neural network benchmarks, including NAS-Bench-201, NAS-Bench-301, MobileNet and ResNet families under no or minimum fine-tuning. We further introduce 3 challenging newly labelled neural network benchmarks: HiAML, Inception and Two-Path, which can concentrate in narrow accuracy ranges. Extensive experiments show that GENNAPE can correctly discern high-performance architectures in these families. Finally, when paired with a search algorithm, GENNAPE can find architectures that improve accuracy while reducing FLOPs on three families.

* Accepted at AAAI 2023; version includes supplementary materials with more details on introduced benchmarks; 14 Pages, 6 Figures, 10 Tables 
Viaarxiv icon

Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification

Oct 04, 2022
Kai Wang, Chenshen Wu, Andy Bagdanov, Xialei Liu, Shiqi Yang, Shangling Jui, Joost van de Weijer

Figure 1 for Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification
Figure 2 for Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification
Figure 3 for Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification
Figure 4 for Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification

Lifelong object re-identification incrementally learns from a stream of re-identification tasks. The objective is to learn a representation that can be applied to all tasks and that generalizes to previously unseen re-identification tasks. The main challenge is that at inference time the representation must generalize to previously unseen identities. To address this problem, we apply continual meta metric learning to lifelong object re-identification. To prevent forgetting of previous tasks, we use knowledge distillation and explore the roles of positive and negative pairs. Based on our observation that the distillation and metric losses are antagonistic, we propose to remove positive pairs from distillation to robustify model updates. Our method, called Distillation without Positive Pairs (DwoPP), is evaluated on extensive intra-domain experiments on person and vehicle re-identification datasets, as well as inter-domain experiments on the LReID benchmark. Our experiments demonstrate that DwoPP significantly outperforms the state-of-the-art. The code is here: https://github.com/wangkai930418/DwoPP_code

* BMVC 2022 
Viaarxiv icon

One Ring to Bring Them All: Towards Open-Set Recognition under Domain Shift

Jun 07, 2022
Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer

Figure 1 for One Ring to Bring Them All: Towards Open-Set Recognition under Domain Shift
Figure 2 for One Ring to Bring Them All: Towards Open-Set Recognition under Domain Shift
Figure 3 for One Ring to Bring Them All: Towards Open-Set Recognition under Domain Shift
Figure 4 for One Ring to Bring Them All: Towards Open-Set Recognition under Domain Shift

In this paper, we investigate $\textit{open-set recognition}$ with domain shift, where the final goal is to achieve $\textit{Source-free Universal Domain Adaptation}$ (SF-UNDA), which addresses the situation where there exist both domain and category shifts between source and target domains. Under the SF-UNDA setting, the model cannot access source data anymore during target adaptation, which aims to address data privacy concerns. We propose a novel training scheme to learn a ($n$+1)-way classifier to predict the $n$ source classes and the unknown class, where samples of only known source categories are available for training. Furthermore, for target adaptation, we simply adopt a weighted entropy minimization to adapt the source pretrained model to the unlabeled target domain without source data. In experiments, we show: $\textbf{1)}$ After source training, the resulting source model can get excellent performance for $\textit{open-set single domain generalization}$ and also $\textit{open-set recognition}$ tasks; $\textbf{2)}$ After target adaptation, our method surpasses current UNDA approaches which demand source data during adaptation on several benchmarks. The versatility to several different tasks strongly proves the efficacy and generalization ability of our method. $\textbf{3)}$ When augmented with a closed-set domain adaptation approach during target adaptation, our source-free method further outperforms the current state-of-the-art UNDA method by 2.5%, 7.2% and 13% on Office-31, Office-Home and VisDA respectively. Code will be available in https://github.com/Albert0147/OneRing.

Viaarxiv icon

Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation

May 19, 2022
Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer

Figure 1 for Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation
Figure 2 for Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation
Figure 3 for Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation
Figure 4 for Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation

We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method.

* Update the hyperparameter section 
Viaarxiv icon

R5: Rule Discovery with Reinforced and Recurrent Relational Reasoning

May 13, 2022
Shengyao Lu, Bang Liu, Keith G. Mills, Shangling Jui, Di Niu

Figure 1 for R5: Rule Discovery with Reinforced and Recurrent Relational Reasoning
Figure 2 for R5: Rule Discovery with Reinforced and Recurrent Relational Reasoning
Figure 3 for R5: Rule Discovery with Reinforced and Recurrent Relational Reasoning
Figure 4 for R5: Rule Discovery with Reinforced and Recurrent Relational Reasoning

Systematicity, i.e., the ability to recombine known parts and rules to form new sequences while reasoning over relational data, is critical to machine intelligence. A model with strong systematicity is able to train on small-scale tasks and generalize to large-scale tasks. In this paper, we propose R5, a relational reasoning framework based on reinforcement learning that reasons over relational graph data and explicitly mines underlying compositional logical rules from observations. R5 has strong systematicity and being robust to noisy data. It consists of a policy value network equipped with Monte Carlo Tree Search to perform recurrent relational prediction and a backtrack rewriting mechanism for rule mining. By alternately applying the two components, R5 progressively learns a set of explicit rules from data and performs explainable and generalizable relation prediction. We conduct extensive evaluations on multiple datasets. Experimental results show that R5 outperforms various embedding-based and rule induction baselines on relation prediction tasks while achieving a high recall rate in discovering ground truth rules.

* ICLR 2022 Spotlight 
Viaarxiv icon