Alert button
Picture for Anton van den Hengel

Anton van den Hengel

Alert button

Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines

Nov 29, 2023
Hamed Damirchi, Cristian Rodríguez-Opazo, Ehsan Abbasnejad, Damien Teney, Javen Qinfeng Shi, Stephen Gould, Anton van den Hengel

Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. The Web likely contains the information necessary to excel on any specific application, but identifying the right data a priori is challenging. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval. We propose to retrieve useful data from the Web at test time based on test cases that the model is uncertain about. Different from existing retrieval-augmented approaches, we then update the model to address this underlying uncertainty. We demonstrate substantial improvements in zero-shot performance, e.g. a remarkable increase of 15 percentage points in accuracy on the Stanford Cars and Flowers datasets. We also present extensive experiments that explore the impact of noisy retrieval and different learning strategies.

Viaarxiv icon

Identifiable Latent Polynomial Causal Models Through the Lens of Change

Oct 24, 2023
Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

Figure 1 for Identifiable Latent Polynomial Causal Models Through the Lens of Change
Figure 2 for Identifiable Latent Polynomial Causal Models Through the Lens of Change
Figure 3 for Identifiable Latent Polynomial Causal Models Through the Lens of Change
Figure 4 for Identifiable Latent Polynomial Causal Models Through the Lens of Change

Causal representation learning aims to unveil latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments \citep{liu2022identifying}. However, this progress rests on the assumption that the causal relationships among latent causal variables adhere strictly to linear Gaussian models. In this paper, we extend the scope of latent causal models to involve nonlinear causal relationships, represented by polynomial models, and general noise distributions conforming to the exponential family. Additionally, we investigate the necessity of imposing changes on all causal parameters and present partial identifiability results when part of them remains unchanged. Further, we propose a novel empirical estimation method, grounded in our theoretical finding, that enables learning consistent latent causal representations. Our experimental results, obtained from both synthetic and real-world data, validate our theoretical contributions concerning identifiability and consistency.

Viaarxiv icon

Domain Generalization via Rationale Invariance

Aug 22, 2023
Liang Chen, Yong Zhang, Yibing Song, Anton van den Hengel, Lingqiao Liu

Figure 1 for Domain Generalization via Rationale Invariance
Figure 2 for Domain Generalization via Rationale Invariance
Figure 3 for Domain Generalization via Rationale Invariance
Figure 4 for Domain Generalization via Rationale Invariance

This paper offers a new perspective to ease the challenge of domain generalization, which involves maintaining robust results even in unseen environments. Our design focuses on the decision-making process in the final classifier layer. Specifically, we propose treating the element-wise contributions to the final results as the rationale for making a decision and representing the rationale for each sample as a matrix. For a well-generalized model, we suggest the rationale matrices for samples belonging to the same category should be similar, indicating the model relies on domain-invariant clues to make decisions, thereby ensuring robust results. To implement this idea, we introduce a rationale invariance loss as a simple regularization technique, requiring only a few lines of code. Our experiments demonstrate that the proposed approach achieves competitive results across various datasets, despite its simplicity. Code is available at \url{https://github.com/liangchen527/RIDG}.

* Accepted in ICCV 2023 
Viaarxiv icon

RanPAC: Random Projections and Pre-trained Models for Continual Learning

Jul 05, 2023
Mark D. McDonnell, Dong Gong, Amin Parveneh, Ehsan Abbasnejad, Anton van den Hengel

Figure 1 for RanPAC: Random Projections and Pre-trained Models for Continual Learning
Figure 2 for RanPAC: Random Projections and Pre-trained Models for Continual Learning
Figure 3 for RanPAC: Random Projections and Pre-trained Models for Continual Learning
Figure 4 for RanPAC: Random Projections and Pre-trained Models for Continual Learning

Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for various downstream requirements. Several CL methods based on pre-trained models have been explored, either utilizing pre-extracted features directly (which makes bridging distribution gaps challenging) or incorporating adaptors (which may be subject to forgetting). In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning. Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 10\% and 62\% on seven class-incremental benchmark datasets, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast continual learning has not hitherto been fully tapped.

* 30 pages, 11 figures 
Viaarxiv icon

Provably Efficient Bayesian Optimization with Unbiased Gaussian Process Hyperparameter Estimation

Jun 12, 2023
Huong Ha, Vu Nguyen, Hongyu Zhang, Anton van den Hengel

Figure 1 for Provably Efficient Bayesian Optimization with Unbiased Gaussian Process Hyperparameter Estimation
Figure 2 for Provably Efficient Bayesian Optimization with Unbiased Gaussian Process Hyperparameter Estimation
Figure 3 for Provably Efficient Bayesian Optimization with Unbiased Gaussian Process Hyperparameter Estimation
Figure 4 for Provably Efficient Bayesian Optimization with Unbiased Gaussian Process Hyperparameter Estimation

Gaussian process (GP) based Bayesian optimization (BO) is a powerful method for optimizing black-box functions efficiently. The practical performance and theoretical guarantees associated with this approach depend on having the correct GP hyperparameter values, which are usually unknown in advance and need to be estimated from the observed data. However, in practice, these estimations could be incorrect due to biased data sampling strategies commonly used in BO. This can lead to degraded performance and break the sub-linear global convergence guarantee of BO. To address this issue, we propose a new BO method that can sub-linearly converge to the global optimum of the objective function even when the true GP hyperparameters are unknown in advance and need to be estimated from the observed data. Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process, and employs a novel training loss function for the GP hyperparameter estimation process that ensures unbiased estimation from the observed data. We further provide theoretical analysis of our proposed method. Finally, we demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.

* 23 pages, 5 figures 
Viaarxiv icon

Knowledge Combination to Learn Rotated Detection Without Rotated Annotation

Apr 05, 2023
Tianyu Zhu, Bryce Ferenczi, Pulak Purkait, Tom Drummond, Hamid Rezatofighi, Anton van den Hengel

Figure 1 for Knowledge Combination to Learn Rotated Detection Without Rotated Annotation
Figure 2 for Knowledge Combination to Learn Rotated Detection Without Rotated Annotation
Figure 3 for Knowledge Combination to Learn Rotated Detection Without Rotated Annotation
Figure 4 for Knowledge Combination to Learn Rotated Detection Without Rotated Annotation

Rotated bounding boxes drastically reduce output ambiguity of elongated objects, making it superior to axis-aligned bounding boxes. Despite the effectiveness, rotated detectors are not widely employed. Annotating rotated bounding boxes is such a laborious process that they are not provided in many detection datasets where axis-aligned annotations are used instead. In this paper, we propose a framework that allows the model to predict precise rotated boxes only requiring cheaper axis-aligned annotation of the target dataset 1. To achieve this, we leverage the fact that neural networks are capable of learning richer representation of the target domain than what is utilized by the task. The under-utilized representation can be exploited to address a more detailed task. Our framework combines task knowledge of an out-of-domain source dataset with stronger annotation and domain knowledge of the target dataset with weaker annotation. A novel assignment process and projection loss are used to enable the co-training on the source and target datasets. As a result, the model is able to solve the more detailed task in the target domain, without additional computation overhead during inference. We extensively evaluate the method on various target datasets including fresh-produce dataset, HRSC2016 and SSDD. Results show that the proposed method consistently performs on par with the fully supervised approach.

* 10 pages, 5 figures, Accepted by CVPR 2023 
Viaarxiv icon

Adaptive Cross Batch Normalization for Metric Learning

Mar 30, 2023
Thalaiyasingam Ajanthan, Matt Ma, Anton van den Hengel, Stephen Gould

Figure 1 for Adaptive Cross Batch Normalization for Metric Learning
Figure 2 for Adaptive Cross Batch Normalization for Metric Learning
Figure 3 for Adaptive Cross Batch Normalization for Metric Learning
Figure 4 for Adaptive Cross Batch Normalization for Metric Learning

Metric learning is a fundamental problem in computer vision whereby a model is trained to learn a semantically useful embedding space via ranking losses. Traditionally, the effectiveness of a ranking loss depends on the minibatch size, and is, therefore, inherently limited by the memory constraints of the underlying hardware. While simply accumulating the embeddings across minibatches has proved useful (Wang et al. [2020]), we show that it is equally important to ensure that the accumulated embeddings are up to date. In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration as the learnable parameters are being updated. In this paper, we model representational drift as distribution misalignment and tackle it using moment matching. The result is a simple method for updating the stored embeddings to match the first and second moments of the current embeddings at each training iteration. Experiments on three popular image retrieval datasets, namely, SOP, In-Shop, and DeepFashion2, demonstrate that our approach significantly improves the performance in all scenarios.

Viaarxiv icon

Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems

Mar 03, 2023
Yangyang Shu, Anton van den Hengel, Lingqiao Liu

Figure 1 for Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems
Figure 2 for Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems
Figure 3 for Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems
Figure 4 for Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems

Self-supervised learning (SSL) strategies have demonstrated remarkable performance in various recognition tasks. However, both our preliminary investigation and recent studies suggest that they may be less effective in learning representations for fine-grained visual recognition (FGVR) since many features helpful for optimizing SSL objectives are not suitable for characterizing the subtle differences in FGVR. To overcome this issue, we propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes, dubbed as common rationales in this paper. Intuitively, common rationales tend to correspond to the discriminative patterns from the key parts of foreground objects. We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective without using any pre-trained object parts or saliency detectors, making it seamlessly to be integrated with the existing SSL process. Specifically, we fit the GradCAM with a branch with limited fitting capacity, which allows the branch to capture the common rationales and discard the less common discriminative patterns. At the test stage, the branch generates a set of spatial weights to selectively aggregate features representing an instance. Extensive experimental results on four visual tasks demonstrate that the proposed method can lead to a significant improvement in different evaluation settings.

* To Appear at CVPR 2023 
Viaarxiv icon

Program Generation from Diverse Video Demonstrations

Feb 01, 2023
Anthony Manchin, Jamie Sherrah, Qi Wu, Anton van den Hengel

Figure 1 for Program Generation from Diverse Video Demonstrations
Figure 2 for Program Generation from Diverse Video Demonstrations
Figure 3 for Program Generation from Diverse Video Demonstrations
Figure 4 for Program Generation from Diverse Video Demonstrations

The ability to use inductive reasoning to extract general rules from multiple observations is a vital indicator of intelligence. As humans, we use this ability to not only interpret the world around us, but also to predict the outcomes of the various interactions we experience. Generalising over multiple observations is a task that has historically presented difficulties for machines to grasp, especially when requiring computer vision. In this paper, we propose a model that can extract general rules from video demonstrations by simultaneously performing summarisation and translation. Our approach differs from prior works by framing the problem as a multi-sequence-to-sequence task, wherein summarisation is learnt by the model. This allows our model to utilise edge cases that would otherwise be suppressed or discarded by traditional summarisation techniques. Additionally, we show that our approach can handle noisy specifications without the need for additional filtering methods. We evaluate our model by synthesising programs from video demonstrations in the Vizdoom environment achieving state-of-the-art results with a relative increase of 11.75% program accuracy on prior works

Viaarxiv icon

Understanding and Improving the Role of Projection Head in Self-Supervised Learning

Dec 22, 2022
Kartik Gupta, Thalaiyasingam Ajanthan, Anton van den Hengel, Stephen Gould

Figure 1 for Understanding and Improving the Role of Projection Head in Self-Supervised Learning
Figure 2 for Understanding and Improving the Role of Projection Head in Self-Supervised Learning
Figure 3 for Understanding and Improving the Role of Projection Head in Self-Supervised Learning
Figure 4 for Understanding and Improving the Role of Projection Head in Self-Supervised Learning

Self-supervised learning (SSL) aims to produce useful feature representations without access to any human-labeled data annotations. Due to the success of recent SSL methods based on contrastive learning, such as SimCLR, this problem has gained popularity. Most current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective and then discard the learned projection head after training. This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training? In this work, we first perform a systematic study on the behavior of SSL training focusing on the role of the projection head layers. By formulating the projection head as a parametric component for the InfoNCE objective rather than a part of the network, we present an alternative optimization scheme for training contrastive learning based SSL frameworks. Our experimental study on multiple image classification datasets demonstrates the effectiveness of the proposed approach over alternatives in the SSL literature.

Viaarxiv icon