Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hamed Pirsiavash

University of Maryland Baltimore County

A Cookbook of Self-Supervised Learning

Apr 24, 2023

Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian(+9 more)

Figure 1 for A Cookbook of Self-Supervised Learning

Figure 2 for A Cookbook of Self-Supervised Learning

Figure 3 for A Cookbook of Self-Supervised Learning

Figure 4 for A Cookbook of Self-Supervised Learning

Abstract:Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be.

Via

Access Paper or Ask Questions

Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning

Apr 04, 2023

Ajinkya Tejankar, Maziar Sanjabi, Qifan Wang, Sinong Wang, Hamed Firooz, Hamed Pirsiavash, Liang Tan

Figure 1 for Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning

Figure 2 for Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning

Figure 3 for Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning

Figure 4 for Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning

Abstract:Recently, self-supervised learning (SSL) was shown to be vulnerable to patch-based data poisoning backdoor attacks. It was shown that an adversary can poison a small part of the unlabeled data so that when a victim trains an SSL model on it, the final model will have a backdoor that the adversary can exploit. This work aims to defend self-supervised learning against such attacks. We use a three-step defense pipeline, where we first train a model on the poisoned data. In the second step, our proposed defense algorithm (PatchSearch) uses the trained model to search the training data for poisoned samples and removes them from the training set. In the third step, a final model is trained on the cleaned-up training set. Our results show that PatchSearch is an effective defense. As an example, it improves a model's accuracy on images containing the trigger from 38.2% to 63.7% which is very close to the clean model's accuracy, 64.6%. Moreover, we show that PatchSearch outperforms baselines and state-of-the-art defense approaches including those using additional clean, trusted data. Our code is available at https://github.com/UCDvision/PatchSearch

* Accepted to CVPR 2023

Via

Access Paper or Ask Questions

Is Multi-Task Learning an Upper Bound for Continual Learning?

Oct 26, 2022

Zihao Wu, Huy Tran, Hamed Pirsiavash, Soheil Kolouri

Figure 1 for Is Multi-Task Learning an Upper Bound for Continual Learning?

Figure 2 for Is Multi-Task Learning an Upper Bound for Continual Learning?

Figure 3 for Is Multi-Task Learning an Upper Bound for Continual Learning?

Figure 4 for Is Multi-Task Learning an Upper Bound for Continual Learning?

Abstract:Continual and multi-task learning are common machine learning approaches to learning from multiple tasks. The existing works in the literature often assume multi-task learning as a sensible performance upper bound for various continual learning algorithms. While this assumption is empirically verified for different continual learning benchmarks, it is not rigorously justified. Moreover, it is imaginable that when learning from multiple tasks, a small subset of these tasks could behave as adversarial tasks reducing the overall learning performance in a multi-task setting. In contrast, continual learning approaches can avoid the performance drop caused by such adversarial tasks to preserve their performance on the rest of the tasks, leading to better performance than a multi-task learner. This paper proposes a novel continual self-supervised learning setting, where each task corresponds to learning an invariant representation for a specific class of data augmentations. In this setting, we show that continual learning often beats multi-task learning on various benchmark datasets, including MNIST, CIFAR-10, and CIFAR-100.

Via

Access Paper or Ask Questions

SimA: Simple Softmax-free Attention for Vision Transformers

Jun 17, 2022

Soroush Abbasi Koohpayegani, Hamed Pirsiavash

Figure 1 for SimA: Simple Softmax-free Attention for Vision Transformers

Figure 2 for SimA: Simple Softmax-free Attention for Vision Transformers

Figure 3 for SimA: Simple Softmax-free Attention for Vision Transformers

Figure 4 for SimA: Simple Softmax-free Attention for Vision Transformers

Abstract:Recently, vision transformers have become very popular. However, deploying them in many applications is computationally expensive partly due to the Softmax layer in the attention block. We introduce a simple but effective, Softmax-free attention block, SimA, which normalizes query and key matrices with simple $\ell_1$-norm instead of using Softmax layer. Then, the attention block in SimA is a simple multiplication of three matrices, so SimA can dynamically change the ordering of the computation at the test time to achieve linear computation on the number of tokens or the number of channels. We empirically show that SimA applied to three SOTA variations of transformers, DeiT, XCiT, and CvT, results in on-par accuracy compared to the SOTA models, without any need for Softmax layer. Interestingly, changing SimA from multi-head to single-head has only a small effect on the accuracy, which simplifies the attention block further. The code is available here: $\href{https://github.com/UCDvision/sima}{\text{This https URL}}$

* Code is available here: $\href{https://github.com/UCDvision/sima}{\text{This https URL}}$

Via

Access Paper or Ask Questions

Backdoor Attacks on Vision Transformers

Jun 16, 2022

Akshayvarun Subramanya, Aniruddha Saha, Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Hamed Pirsiavash

Figure 1 for Backdoor Attacks on Vision Transformers

Figure 2 for Backdoor Attacks on Vision Transformers

Figure 3 for Backdoor Attacks on Vision Transformers

Figure 4 for Backdoor Attacks on Vision Transformers

Abstract:Vision Transformers (ViT) have recently demonstrated exemplary performance on a variety of vision tasks and are being used as an alternative to CNNs. Their design is based on a self-attention mechanism that processes images as a sequence of patches, which is quite different compared to CNNs. Hence it is interesting to study if ViTs are vulnerable to backdoor attacks. Backdoor attacks happen when an attacker poisons a small part of the training data for malicious purposes. The model performance is good on clean test images, but the attacker can manipulate the decision of the model by showing the trigger at test time. To the best of our knowledge, we are the first to show that ViTs are vulnerable to backdoor attacks. We also find an intriguing difference between ViTs and CNNs - interpretation algorithms effectively highlight the trigger on test images for ViTs but not for CNNs. Based on this observation, we propose a test-time image blocking defense for ViTs which reduces the attack success rate by a large margin. Code is available here: https://github.com/UCDvision/backdoor_transformer.git

Via

Access Paper or Ask Questions

PRANC: Pseudo RAndom Networks for Compacting deep models

Jun 16, 2022

Parsa Nooralinejad, Ali Abbasi, Soheil Kolouri, Hamed Pirsiavash

Figure 1 for PRANC: Pseudo RAndom Networks for Compacting deep models

Figure 2 for PRANC: Pseudo RAndom Networks for Compacting deep models

Figure 3 for PRANC: Pseudo RAndom Networks for Compacting deep models

Figure 4 for PRANC: Pseudo RAndom Networks for Compacting deep models

Abstract:Communication becomes a bottleneck in various distributed Machine Learning settings. Here, we propose a novel training framework that leads to highly efficient communication of models between agents. In short, we train our network to be a linear combination of many pseudo-randomly generated frozen models. For communication, the source agent transmits only the `seed' scalar used to generate the pseudo-random `basis' networks along with the learned linear mixture coefficients. Our method, denoted as PRANC, learns almost $100\times$ fewer parameters than a deep model and still performs well on several datasets and architectures. PRANC enables 1) efficient communication of models between agents, 2) efficient model storage, and 3) accelerated inference by generating layer-wise weights on the fly. We test PRANC on CIFAR-10, CIFAR-100, tinyImageNet, and ImageNet-100 with various architectures like AlexNet, LeNet, ResNet18, ResNet20, and ResNet56 and demonstrate a massive reduction in the number of parameters while providing satisfactory performance on these benchmark datasets. The code is available \href{https://github.com/UCDvision/PRANC}{https://github.com/UCDvision/PRANC}

Via

Access Paper or Ask Questions

A Simple Approach to Adversarial Robustness in Few-shot Image Classification

Apr 11, 2022

Akshayvarun Subramanya, Hamed Pirsiavash

Figure 1 for A Simple Approach to Adversarial Robustness in Few-shot Image Classification

Figure 2 for A Simple Approach to Adversarial Robustness in Few-shot Image Classification

Figure 3 for A Simple Approach to Adversarial Robustness in Few-shot Image Classification

Figure 4 for A Simple Approach to Adversarial Robustness in Few-shot Image Classification

Abstract:Few-shot image classification, where the goal is to generalize to tasks with limited labeled data, has seen great progress over the years. However, the classifiers are vulnerable to adversarial examples, posing a question regarding their generalization capabilities. Recent works have tried to combine meta-learning approaches with adversarial training to improve the robustness of few-shot classifiers. We show that a simple transfer-learning based approach can be used to train adversarially robust few-shot classifiers. We also present a method for novel classification task based on calibrating the centroid of the few-shot category towards the base classes. We show that standard adversarial training on base categories along with calibrated centroid-based classifier in the novel categories, outperforms or is on-par with state-of-the-art advanced methods on standard benchmarks for few-shot learning. Our method is simple, easy to scale, and with little effort can lead to robust few-shot classifiers. Code is available here: \url{https://github.com/UCDvision/Simple_few_shot.git}

Via

Access Paper or Ask Questions

Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Mar 12, 2022

Ali Abbasi, Parsa Nooralinejad, Vladimir Braverman, Hamed Pirsiavash, Soheil Kolouri

Figure 1 for Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Figure 2 for Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Figure 3 for Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Figure 4 for Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Abstract:Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Despite their phenomenal performance in a wide variety of applications, deep neural networks are prone to forgetting their previously learned information upon learning new ones. This phenomenon is called "catastrophic forgetting" and is deeply rooted in the stability-plasticity dilemma. Overcoming catastrophic forgetting in deep neural networks has become an active field of research in recent years. In particular, gradient projection-based methods have recently shown exceptional performance at overcoming catastrophic forgetting. This paper proposes two biologically-inspired mechanisms based on sparsity and heterogeneous dropout that significantly increase a continual learner's performance over a long sequence of tasks. Our proposed approach builds on the Gradient Projection Memory (GPM) framework. We leverage K-winner activations in each layer of a neural network to enforce layer-wise sparse activations for each task, together with a between-task heterogeneous dropout that encourages the network to use non-overlapping activation patterns between different tasks. In addition, we introduce Continual Swiss Roll as a lightweight and interpretable -- yet challenging -- synthetic benchmark for continual learning. Lastly, we provide an in-depth analysis of our proposed method and demonstrate a significant performance boost on various benchmark continual learning problems.

Via

Access Paper or Ask Questions

Amenable Sparse Network Investigator

Feb 18, 2022

Saeed Damadi, Erfan Nouri, Hamed Pirsiavash

Abstract:As the optimization problem of pruning a neural network is nonconvex and the strategies are only guaranteed to find local solutions, a good initialization becomes paramount. To this end, we present the Amenable Sparse Network Investigator ASNI algorithm that learns a sparse network whose initialization is compressed. The learned sparse structure found by ASNI is amenable since its corresponding initialization, which is also learned by ASNI, consists of only 2L numbers, where L is the number of layers. Requiring just a few numbers for parameter initialization of the learned sparse network makes the sparse network amenable. The learned initialization set consists of L signed pairs that act as the centroids of parameter values of each layer. These centroids are learned by the ASNI algorithm after only one single round of training. We experimentally show that the learned centroids are sufficient to initialize the nonzero parameters of the learned sparse structure in order to achieve approximately the accuracy of non-sparse network. We also empirically show that in order to learn the centroids, one needs to prune the network globally and gradually. Hence, for parameter pruning we propose a novel strategy based on a sigmoid function that specifies the sparsity percentage across the network globally. Then, pruning is done magnitude-wise and after each epoch of training. We have performed a series of experiments utilizing networks such as ResNets, VGG-style, small convolutional, and fully connected ones on ImageNet, CIFAR10, and MNIST datasets.

Via

Access Paper or Ask Questions

SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

Jan 13, 2022

K L Navaneet, Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Hamed Pirsiavash

Figure 1 for SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

Figure 2 for SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

Figure 3 for SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

Figure 4 for SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

Abstract:Feature regression is a simple way to distill large neural network models to smaller ones. We show that with simple changes to the network architecture, regression can outperform more complex state-of-the-art approaches for knowledge distillation from self-supervised models. Surprisingly, the addition of a multi-layer perceptron head to the CNN backbone is beneficial even if used only during distillation and discarded in the downstream task. Deeper non-linear projections can thus be used to accurately mimic the teacher without changing inference architecture and time. Moreover, we utilize independent projection heads to simultaneously distill multiple teacher networks. We also find that using the same weakly augmented image as input for both teacher and student networks aids distillation. Experiments on ImageNet dataset demonstrate the efficacy of the proposed changes in various self-supervised distillation settings.

* In BMVC 2021. Code available at: https://github.com/UCDvision/simreg

Via

Access Paper or Ask Questions