Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hamed Pirsiavash

University of Maryland Baltimore County

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

Nov 01, 2020

Simon Ging, Mohammadreza Zolfaghari, Hamed Pirsiavash, Thomas Brox

Figure 1 for COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

Figure 2 for COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

Figure 3 for COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

Figure 4 for COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

Abstract:Many real-world video-text tasks involve different levels of granularity, such as frames and words, clip and sentences or videos and paragraphs, each with distinct semantics. In this paper, we propose a Cooperative hierarchical Transformer (COOT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities. The method consists of three major components: an attention-aware feature aggregation layer, which leverages the local temporal context (intra-level, e.g., within a clip), a contextual transformer to learn the interactions between low-level and high-level semantics (inter-level, e.g. clip-video, sentence-paragraph), and a cross-modal cycle-consistency loss to connect video and text. The resulting method compares favorably to the state of the art on several benchmarks while having few parameters. All code is available open-source at https://github.com/gingsi/coot-videotext

* 27 pages, 5 figures, 19 tables. To be published in the 34th conference on Neural Information Processing Systems (NeurIPS 2020). The first two authors contributed equally to this work

Via

Access Paper or Ask Questions

CompRess: Self-Supervised Learning by Compressing Representations

Oct 28, 2020

Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Hamed Pirsiavash

Figure 1 for CompRess: Self-Supervised Learning by Compressing Representations

Figure 2 for CompRess: Self-Supervised Learning by Compressing Representations

Figure 3 for CompRess: Self-Supervised Learning by Compressing Representations

Figure 4 for CompRess: Self-Supervised Learning by Compressing Representations

Abstract:Self-supervised learning aims to learn good representations with unlabeled data. Recent works have shown that larger models benefit more from self-supervised learning than smaller models. As a result, the gap between supervised and self-supervised learning has been greatly reduced for larger models. In this work, instead of designing a new pseudo task for self-supervised learning, we develop a model compression method to compress an already learned, deep self-supervised model (teacher) to a smaller one (student). We train the student model so that it mimics the relative similarity between the data points in the teacher's embedding space. For AlexNet, our method outperforms all previous methods including the fully supervised model on ImageNet linear evaluation (59.0% compared to 56.5%) and on nearest neighbor evaluation (50.7% compared to 41.4%). To the best of our knowledge, this is the first time a self-supervised AlexNet has outperformed supervised one on ImageNet classification. Our code is available here: https://github.com/UMBCvision/CompRess

Via

Access Paper or Ask Questions

A simple baseline for domain adaptation using rotation prediction

Dec 26, 2019

Ajinkya Tejankar, Hamed Pirsiavash

Figure 1 for A simple baseline for domain adaptation using rotation prediction

Figure 2 for A simple baseline for domain adaptation using rotation prediction

Figure 3 for A simple baseline for domain adaptation using rotation prediction

Figure 4 for A simple baseline for domain adaptation using rotation prediction

Abstract:Recently, domain adaptation has become a hot research area with lots of applications. The goal is to adapt a model trained in one domain to another domain with scarce annotated data. We propose a simple yet effective method based on self-supervised learning that outperforms or is on par with most state-of-the-art algorithms, e.g. adversarial domain adaptation. Our method involves two phases: predicting random rotations (self-supervised) on the target domain along with correct labels for the source domain (supervised), and then using self-distillation on the target domain. Our simple method achieves state-of-the-art results on semi-supervised domain adaptation on DomainNet dataset. Further, we observe that the unlabeled target datasets of popular domain adaptation benchmarks do not contain any categories apart from testing categories. We believe this introduces a bias that does not exist in many real applications. We show that removing this bias from the unlabeled data results in a large drop in performance of state-of-the-art methods, while our simple method is relatively robust.

Via

Access Paper or Ask Questions

Adversarial Patches Exploiting Contextual Reasoning in Object Detection

Sep 30, 2019

Aniruddha Saha, Akshayvarun Subramanya, Koninika Patil, Hamed Pirsiavash

Figure 1 for Adversarial Patches Exploiting Contextual Reasoning in Object Detection

Figure 2 for Adversarial Patches Exploiting Contextual Reasoning in Object Detection

Figure 3 for Adversarial Patches Exploiting Contextual Reasoning in Object Detection

Figure 4 for Adversarial Patches Exploiting Contextual Reasoning in Object Detection

Abstract:The usefulness of spatial context in most fast object detection algorithms that do a single forward pass per image is well known where they utilize context to improve their accuracy. In fact, they must do it to increase the inference speed by processing the image just once. We show that an adversary can attack the model by exploiting contextual reasoning. We develop adversarial attack algorithms that make an object detector blind to a particular category chosen by the adversary even though the patch does not overlap with the missed detections. We also show that limiting the use of contextual reasoning in learning the object detector acts as a form of defense that improves the accuracy of the detector after an attack. We believe defending against our practical adversarial attack algorithms is not easy and needs attention from the research community.

Via

Access Paper or Ask Questions

Hidden Trigger Backdoor Attacks

Sep 30, 2019

Aniruddha Saha, Akshayvarun Subramanya, Hamed Pirsiavash

Figure 1 for Hidden Trigger Backdoor Attacks

Figure 2 for Hidden Trigger Backdoor Attacks

Figure 3 for Hidden Trigger Backdoor Attacks

Figure 4 for Hidden Trigger Backdoor Attacks

Abstract:With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise and perturbation to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time. We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.

Via

Access Paper or Ask Questions

Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

Jun 26, 2019

Soheil Kolouri, Aniruddha Saha, Hamed Pirsiavash, Heiko Hoffmann

Figure 1 for Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

Figure 2 for Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

Figure 3 for Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

Figure 4 for Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

Abstract:The unprecedented success of deep neural networks in various applications have made these networks a prime target for adversarial exploitation. In this paper, we introduce a benchmark technique for detecting backdoor attacks (aka Trojan attacks) on deep convolutional neural networks (CNNs). We introduce the concept of Universal Litmus Patterns (ULPs), which enable one to reveal backdoor attacks by feeding these universal patterns to the network and analyzing the output (i.e., classifying as `clean' or `corrupted'). This detection is fast because it requires only a few forward passes through a CNN. We demonstrate the effectiveness of ULPs for detecting backdoor attacks on thousands of networks trained on three benchmark datasets, namely the German Traffic Sign Recognition Benchmark (GTSRB), MNIST, and CIFAR10.

Via

Access Paper or Ask Questions

Towards Hiding Adversarial Examples from Network Interpretation

Dec 06, 2018

Akshayvarun Subramanya, Vipin Pillai, Hamed Pirsiavash

Figure 1 for Towards Hiding Adversarial Examples from Network Interpretation

Figure 2 for Towards Hiding Adversarial Examples from Network Interpretation

Figure 3 for Towards Hiding Adversarial Examples from Network Interpretation

Figure 4 for Towards Hiding Adversarial Examples from Network Interpretation

Abstract:Deep networks have been shown to be fooled rather easily using adversarial attack algorithms. Practical methods such as adversarial patches have been shown to be extremely effective in causing misclassification. However, these patches can be highlighted using standard network interpretation algorithms, thus revealing the identity of the adversary. We show that it is possible to create adversarial patches which not only fool the prediction, but also change what we interpret regarding the cause of prediction. We show that our algorithms can empower adversarial patches, by hiding them from network interpretation tools. We believe our algorithms can facilitate developing more robust network interpretation tools that truly explain the network's underlying decision making process.

Via

Access Paper or Ask Questions

Boosting Self-Supervised Learning via Knowledge Transfer

May 01, 2018

Mehdi Noroozi, Ananth Vinjimoor, Paolo Favaro, Hamed Pirsiavash

Figure 1 for Boosting Self-Supervised Learning via Knowledge Transfer

Figure 2 for Boosting Self-Supervised Learning via Knowledge Transfer

Figure 3 for Boosting Self-Supervised Learning via Knowledge Transfer

Figure 4 for Boosting Self-Supervised Learning via Knowledge Transfer

Abstract:In self-supervised learning, one trains a model to solve a so-called pretext task on a dataset without the need for human annotation. The main objective, however, is to transfer this model to a target domain and task. Currently, the most effective transfer strategy is fine-tuning, which restricts one to use the same model or parts thereof for both pretext and target tasks. In this paper, we present a novel framework for self-supervised learning that overcomes limitations in designing and comparing different tasks, models, and data domains. In particular, our framework decouples the structure of the self-supervised model from the final task-specific fine-tuned model. This allows us to: 1) quantitatively assess previously incompatible models including handcrafted features; 2) show that deeper neural network models can learn better representations from the same pretext task; 3) transfer knowledge learned with a deep model to a shallower one and thus boost its learning. We use this framework to design a novel self-supervised task, which achieves state-of-the-art performance on the common benchmarks in PASCAL VOC 2007, ILSVRC12 and Places by a significant margin. Our learned features shrink the mAP gap between models trained via self-supervised learning and supervised learning from 5.9% to 2.6% in object detection on PASCAL VOC 2007.

Via

Access Paper or Ask Questions

Representation Learning by Learning to Count

Aug 22, 2017

Mehdi Noroozi, Hamed Pirsiavash, Paolo Favaro

Figure 1 for Representation Learning by Learning to Count

Figure 2 for Representation Learning by Learning to Count

Figure 3 for Representation Learning by Learning to Count

Figure 4 for Representation Learning by Learning to Count

Abstract:We introduce a novel method for representation learning that uses an artificial supervision signal based on counting visual primitives. This supervision signal is obtained from an equivariance relation, which does not require any manual annotation. We relate transformations of images to transformations of the representations. More specifically, we look for the representation that satisfies such relation rather than the transformations that match a given representation. In this paper, we use two image transformations in the context of counting: scaling and tiling. The first transformation exploits the fact that the number of visual primitives should be invariant to scale. The second transformation allows us to equate the total number of visual primitives in each tile to that in the whole image. These two transformations are combined in one constraint and used to train a neural network with a contrastive loss. The proposed task produces representations that perform on par or exceed the state of the art in transfer learning benchmarks.

* ICCV 2017(oral)

Via

Access Paper or Ask Questions

Predicting Motivations of Actions by Leveraging Text

Nov 30, 2016

Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba

Figure 1 for Predicting Motivations of Actions by Leveraging Text

Figure 2 for Predicting Motivations of Actions by Leveraging Text

Figure 3 for Predicting Motivations of Actions by Leveraging Text

Figure 4 for Predicting Motivations of Actions by Leveraging Text

Abstract:Understanding human actions is a key problem in computer vision. However, recognizing actions is only the first step of understanding what a person is doing. In this paper, we introduce the problem of predicting why a person has performed an action in images. This problem has many applications in human activity understanding, such as anticipating or explaining an action. To study this problem, we introduce a new dataset of people performing actions annotated with likely motivations. However, the information in an image alone may not be sufficient to automatically solve this task. Since humans can rely on their lifetime of experiences to infer motivation, we propose to give computer vision systems access to some of these experiences by using recently developed natural language models to mine knowledge stored in massive amounts of text. While we are still far away from fully understanding motivation, our results suggest that transferring knowledge from language into vision can help machines understand why people in images might be performing an action.

* CVPR 2016

Via

Access Paper or Ask Questions