Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ehsan Abbasnejad

Rethinking State Disentanglement in Causal Reinforcement Learning

Aug 24, 2024

Haiyao Cao, Zhen Zhang, Panpan Cai, Yuhang Liu, Jinan Zou, Ehsan Abbasnejad, Biwei Huang, Mingming Gong, Anton van den Hengel, Javen Qinfeng Shi

Figure 1 for Rethinking State Disentanglement in Causal Reinforcement Learning

Figure 2 for Rethinking State Disentanglement in Causal Reinforcement Learning

Figure 3 for Rethinking State Disentanglement in Causal Reinforcement Learning

Figure 4 for Rethinking State Disentanglement in Causal Reinforcement Learning

Abstract:One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of algorithms. However, these results are often derived from a purely causal viewpoint, which may overlook the specific RL context. We revisit this research line and find that incorporating RL-specific context can reduce unnecessary assumptions in previous identifiability analyses for latent states. More importantly, removing these assumptions allows algorithm design to go beyond the earlier boundaries constrained by them. Leveraging these insights, we propose a novel approach for general partially observable Markov Decision Processes (POMDPs) by replacing the complicated structural constraints in previous methods with two simple constraints for transition and reward preservation. With the two constraints, the proposed algorithm is guaranteed to disentangle state and noise that is faithful to the underlying dynamics. Empirical evidence from extensive benchmark control tasks demonstrates the superiority of our approach over existing counterparts in effectively disentangling state belief from noise.

Via

Access Paper or Ask Questions

Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks

Jul 30, 2024

Bao Gia Doan, Afshar Shamsi, Xiao-Yu Guo, Arash Mohammadi, Hamid Alinejad-Rokny, Dino Sejdinovic, Damith C. Ranasinghe, Ehsan Abbasnejad

Figure 1 for Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks

Figure 2 for Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks

Figure 3 for Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks

Figure 4 for Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks

Abstract:Computational complexity of Bayesian learning is impeding its adoption in practical, large-scale tasks. Despite demonstrations of significant merits such as improved robustness and resilience to unseen or out-of-distribution inputs over their non- Bayesian counterparts, their practical use has faded to near insignificance. In this study, we introduce an innovative framework to mitigate the computational burden of Bayesian neural networks (BNNs). Our approach follows the principle of Bayesian techniques based on deep ensembles, but significantly reduces their cost via multiple low-rank perturbations of parameters arising from a pre-trained neural network. Both vanilla version of ensembles as well as more sophisticated schemes such as Bayesian learning with Stein Variational Gradient Descent (SVGD), previously deemed impractical for large models, can be seamlessly implemented within the proposed framework, called Bayesian Low-Rank LeArning (Bella). In a nutshell, i) Bella achieves a dramatic reduction in the number of trainable parameters required to approximate a Bayesian posterior; and ii) it not only maintains, but in some instances, surpasses the performance of conventional Bayesian learning methods and non-Bayesian baselines. Our results with large-scale tasks such as ImageNet, CAMELYON17, DomainNet, VQA with CLIP, LLaVA demonstrate the effectiveness and versatility of Bella in building highly scalable and practical Bayesian deep models for real-world applications.

* 25 pages, 14 figures, 11 tables

Via

Access Paper or Ask Questions

Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

Jul 03, 2024

Frederic Z. Zhang, Paul Albert, Cristian Rodriguez-Opazo, Anton van den Hengel, Ehsan Abbasnejad

Figure 1 for Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

Figure 2 for Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

Figure 3 for Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

Figure 4 for Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

Abstract:Pre-trained models produce strong generic representations that can be adapted via fine-tuning. The learned weight difference relative to the pre-trained model, known as a task vector, characterises the direction and stride of fine-tuning. The significance of task vectors is such that simple arithmetic operations on them can be used to combine diverse representations from different domains. This paper builds on these properties of task vectors and aims to answer (1) whether components of task vectors, particularly parameter blocks, exhibit similar characteristics, and (2) how such blocks can be used to enhance knowledge composition and transfer. To this end, we introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level. We show that such linear combinations explicitly exploit the low intrinsic dimensionality of pre-trained models, with only a few coefficients being the learnable parameters. Furthermore, composition of parameter blocks leverages the already learned representations, thereby reducing the dependency on large amounts of data. We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives. In particular, we show that (1) learned anisotropic scaling allows task vectors to be more disentangled, causing less interference in composition; (2) task vector composition excels with scarce or no labeled data and is less prone to domain shift, thus leading to better generalisability; (3) mixing the most informative parameter blocks across different task vectors prior to training can reduce the memory footprint and improve the flexibility of knowledge transfer. Moreover, we show the potential of aTLAS as a PEFT method, particularly with less data, and demonstrate that its scalibility.

Via

Access Paper or Ask Questions

Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

May 27, 2024

Cristian Rodriguez-Opazo, Ehsan Abbasnejad, Damien Teney, Edison Marrese-Taylor, Hamed Damirchi, Anton van den Hengel

Figure 1 for Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

Figure 2 for Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

Figure 3 for Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

Figure 4 for Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

Abstract:Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various architectures, from vision transformers (ViTs) to convolutional networks (ResNets) have been trained with CLIP to serve as general solutions to diverse vision tasks. This paper explores the differences across various CLIP-trained vision backbones. Despite using the same data and training objective, we find that these architectures have notably different representations, different classification performance across datasets, and different robustness properties to certain types of image perturbations. Our findings indicate a remarkable possible synergy across backbones by leveraging their respective strengths. In principle, classification accuracy could be improved by over 40 percentage with an informed selection of the optimal backbone per test example.Using this insight, we develop a straightforward yet powerful approach to adaptively ensemble multiple backbones. The approach uses as few as one labeled example per class to tune the adaptive combination of backbones. On a large collection of datasets, the method achieves a remarkable increase in accuracy of up to 39.1% over the best single backbone, well beyond traditional ensembles

* arXiv admin note: substantial text overlap with arXiv:2312.14400

Via

Access Paper or Ask Questions

BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack

Apr 08, 2024

Viet Quoc Vo, Ehsan Abbasnejad, Damith C. Ranasinghe

Abstract:We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries. Sparse attacks aim to discover a minimum number-the l0 bounded-perturbations to model inputs to craft adversarial examples and misguide model decisions. But, in contrast to query-based dense attack counterparts against black-box models, constructing sparse adversarial perturbations, even when models serve confidence score information to queries in a score-based setting, is non-trivial. Because, such an attack leads to i) an NP-hard problem; and ii) a non-differentiable search space. We develop the BruSLeAttack-a new, faster (more query-efficient) Bayesian algorithm for the problem. We conduct extensive attack evaluations including an attack demonstration against a Machine Learning as a Service (MLaaS) offering exemplified by Google Cloud Vision and robustness testing of adversarial training regimes and a recent defense against black-box attacks. The proposed attack scales to achieve state-of-the-art attack success rates and query efficiency on standard computer vision tasks such as ImageNet across different model architectures. Our artefacts and DIY attack samples are available on GitHub. Importantly, our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.

* Published as a conference paper at the International Conference on Learning Representations (ICLR 2024). Code is available at https://brusliattack.github.io/

Via

Access Paper or Ask Questions

Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

Mar 12, 2024

Mark D. McDonnell, Dong Gong, Ehsan Abbasnejad, Anton van den Hengel

Figure 1 for Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

Figure 2 for Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

Figure 3 for Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

Figure 4 for Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

Abstract:Continual learning requires a model to adapt to ongoing changes in the data distribution, and often to the set of tasks to be performed. It is rare, however, that the data and task changes are completely unpredictable. Given a description of an overarching goal or data theme, which we call a realm, humans can often guess what concepts are associated with it. We show here that the combination of a large language model and an image generation model can similarly provide useful premonitions as to how a continual learning challenge might develop over time. We use the large language model to generate text descriptions of semantically related classes that might potentially appear in the data stream in future. These descriptions are then rendered using Stable Diffusion to generate new labelled image samples. The resulting synthetic dataset is employed for supervised pre-training, but is discarded prior to commencing continual learning, along with the pre-training classification head. We find that the backbone of our pre-trained networks can learn representations useful for the downstream continual learning problem, thus becoming a valuable input to any existing continual learning method. Although there are complexities arising from the domain gap between real and synthetic images, we show that pre-training models in this manner improves multiple Class Incremenal Learning (CIL) methods on fine-grained image classification benchmarks. Supporting code can be found at https://github.com/cl-premonition/premonition.

* 31 pages total (14 main paper, 5 references, 12 appendices)

Via

Access Paper or Ask Questions

Do Deep Neural Network Solutions Form a Star Domain?

Mar 12, 2024

Ankit Sonthalia, Alexander Rubinstein, Ehsan Abbasnejad, Seong Joon Oh

Figure 1 for Do Deep Neural Network Solutions Form a Star Domain?

Figure 2 for Do Deep Neural Network Solutions Form a Star Domain?

Figure 3 for Do Deep Neural Network Solutions Form a Star Domain?

Figure 4 for Do Deep Neural Network Solutions Form a Star Domain?

Abstract:Entezari et al. (2022) conjectured that neural network solution sets reachable via stochastic gradient descent (SGD) are convex, considering permutation invariances. This means that two independent solutions can be connected by a linear path with low loss, given one of them is appropriately permuted. However, current methods to test this theory often fail to eliminate loss barriers between two independent solutions (Ainsworth et al., 2022; Benzing et al., 2022). In this work, we conjecture that a more relaxed claim holds: the SGD solution set is a star domain that contains a star model that is linearly connected to all the other solutions via paths with low loss values, modulo permutations. We propose the Starlight algorithm that finds a star model of a given learning task. We validate our claim by showing that this star model is linearly connected with other independently found solutions. As an additional benefit of our study, we demonstrate better uncertainty estimates on Bayesian Model Averaging over the obtained star domain. Code is available at https://github.com/aktsonthalia/starlight.

Via

Access Paper or Ask Questions

Neural Redshift: Random Networks are not Random Functions

Mar 05, 2024

Damien Teney, Armand Nicolicioiu, Valentin Hartmann, Ehsan Abbasnejad

Figure 1 for Neural Redshift: Random Networks are not Random Functions

Figure 2 for Neural Redshift: Random Networks are not Random Functions

Figure 3 for Neural Redshift: Random Networks are not Random Functions

Figure 4 for Neural Redshift: Random Networks are not Random Functions

Abstract:Our understanding of the generalization capabilities of neural networks (NNs) is still incomplete. Prevailing explanations are based on implicit biases of gradient descent (GD) but they cannot account for the capabilities of models from gradient-free methods nor the simplicity bias recently observed in untrained networks. This paper seeks other sources of generalization in NNs. Findings. To understand the inductive biases provided by architectures independently from GD, we examine untrained, random-weight networks. Even simple MLPs show strong inductive biases: uniform sampling in weight space yields a very biased distribution of functions in terms of complexity. But unlike common wisdom, NNs do not have an inherent "simplicity bias". This property depends on components such as ReLUs, residual connections, and layer normalizations. Alternative architectures can be built with a bias for any level of complexity. Transformers also inherit all these properties from their building blocks. Implications. We provide a fresh explanation for the success of deep learning independent from gradient-based training. It points at promising avenues for controlling the solutions implemented by trained models.

* IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024

Via

Access Paper or Ask Questions

Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and Variances

Dec 22, 2023

Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Ehsan Abbasnejad, Hamed Damirchi, Ignacio M. Jara, Felipe Bravo-Marquez, Anton van den Hengel

Abstract:Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various neural architectures, spanning Transformer-based models like Vision Transformers (ViTs) to Convolutional Networks (ConvNets) like ResNets, are trained with CLIP and serve as universal backbones across diverse vision tasks. Despite utilizing the same data and training objectives, the effectiveness of representations learned by these architectures raises a critical question. Our investigation explores the differences in CLIP performance among these backbone architectures, revealing significant disparities in their classifications. Notably, normalizing these representations results in substantial performance variations. Our findings showcase a remarkable possible synergy between backbone predictions that could reach an improvement of over 20% through informed selection of the appropriate backbone. Moreover, we propose a simple, yet effective approach to combine predictions from multiple backbones, leading to a notable performance boost of up to 6.34\%. We will release the code for reproducing the results.

Via

Access Paper or Ask Questions

Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines

Nov 29, 2023

Hamed Damirchi, Cristian Rodríguez-Opazo, Ehsan Abbasnejad, Damien Teney, Javen Qinfeng Shi, Stephen Gould, Anton van den Hengel

Abstract:Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. The Web likely contains the information necessary to excel on any specific application, but identifying the right data a priori is challenging. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval. We propose to retrieve useful data from the Web at test time based on test cases that the model is uncertain about. Different from existing retrieval-augmented approaches, we then update the model to address this underlying uncertainty. We demonstrate substantial improvements in zero-shot performance, e.g. a remarkable increase of 15 percentage points in accuracy on the Stanford Cars and Flowers datasets. We also present extensive experiments that explore the impact of noisy retrieval and different learning strategies.

Via

Access Paper or Ask Questions