Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Philip H. S. Torr

University of Oxford

Incremental Tube Construction for Human Action Detection

Jul 23, 2018

Harkirat Singh Behl, Michael Sapienza, Gurkirt Singh, Suman Saha, Fabio Cuzzolin, Philip H. S. Torr

Figure 1 for Incremental Tube Construction for Human Action Detection

Figure 2 for Incremental Tube Construction for Human Action Detection

Figure 3 for Incremental Tube Construction for Human Action Detection

Figure 4 for Incremental Tube Construction for Human Action Detection

Abstract:Current state-of-the-art action detection systems are tailored for offline batch-processing applications. However, for online applications like human-robot interaction, current systems fall short, either because they only detect one action per video, or because they assume that the entire video is available ahead of time. In this work, we introduce a real-time and online joint-labelling and association algorithm for action detection that can incrementally construct space-time action tubes on the most challenging action videos in which different action categories occur concurrently. In contrast to previous methods, we solve the detection-window association and action labelling problems jointly in a single pass. We demonstrate superior online association accuracy and speed (2.2ms per frame) as compared to the current state-of-the-art offline systems. We further demonstrate that the entire action detection pipeline can easily be made to work effectively in real-time using our action tube construction algorithm.

* British Machine Vision Conference (BMVC) 2018

Via

Access Paper or Ask Questions

With Friends Like These, Who Needs Adversaries?

Jul 23, 2018

Saumya Jetley, Nicholas A. Lord, Philip H. S. Torr

Figure 1 for With Friends Like These, Who Needs Adversaries?

Figure 2 for With Friends Like These, Who Needs Adversaries?

Figure 3 for With Friends Like These, Who Needs Adversaries?

Figure 4 for With Friends Like These, Who Needs Adversaries?

Abstract:The vulnerability of deep image classification networks to adversarial attack is now well known, but less well understood. Via a novel experimental analysis, we illustrate some facts about deep convolutional networks (DCNs) that shed new light on their behaviour and its connection to the problem of adversaries, with two key results. The first is a straightforward explanation of the existence of universal adversarial perturbations and their association with specific class identities, obtained by analysing the properties of nets' logit responses as functions of 1D movements along specific image-space directions. The second is the clear demonstration of the tight coupling between classification performance and vulnerability to adversarial attack within the spaces spanned by these directions. Prior work has noted the importance of low-dimensional subspaces in adversarial vulnerability: we illustrate that this likewise represents the nets' notion of saliency. In all, we provide a digestible perspective from which to understand previously reported results which have appeared disjoint or contradictory, with implications for efforts to construct neural nets that are both accurate and robust to adversarial attack.

* 15 pages, 7 figures, 1 table

Via

Access Paper or Ask Questions

Multi-Agent Diverse Generative Adversarial Networks

Jul 16, 2018

Arnab Ghosh, Viveka Kulharia, Vinay Namboodiri, Philip H. S. Torr, Puneet K. Dokania

Figure 1 for Multi-Agent Diverse Generative Adversarial Networks

Figure 2 for Multi-Agent Diverse Generative Adversarial Networks

Figure 3 for Multi-Agent Diverse Generative Adversarial Networks

Figure 4 for Multi-Agent Diverse Generative Adversarial Networks

Abstract:We propose MAD-GAN, an intuitive generalization to the Generative Adversarial Networks (GANs) and its conditional variants to address the well known problem of mode collapse. First, MAD-GAN is a multi-agent GAN architecture incorporating multiple generators and one discriminator. Second, to enforce that different generators capture diverse high probability modes, the discriminator of MAD-GAN is designed such that along with finding the real and fake samples, it is also required to identify the generator that generated the given fake sample. Intuitively, to succeed in this task, the discriminator must learn to push different generators towards different identifiable modes. We perform extensive experiments on synthetic and real datasets and compare MAD-GAN with different variants of GAN. We show high quality diverse sample generations for challenging tasks such as image-to-image translation and face generation. In addition, we also show that MAD-GAN is able to disentangle different modalities when trained using highly challenging diverse-class dataset (e.g. dataset with images of forests, icebergs, and bedrooms). In the end, we show its efficacy on the unsupervised feature representation task. In Appendix, we introduce a similarity based competing objective (MAD-GAN-Sim) which encourages different generators to generate diverse samples based on a user defined similarity metric. We show its performance on the image-to-image translation, and also show its effectiveness on the unsupervised feature representation task.

* This is an updated version of our CVPR'18 paper with the same title. In this version, we also introduce MAD-GAN-Sim in Appendix B

Via

Access Paper or Ask Questions

On the Robustness of Semantic Segmentation Models to Adversarial Attacks

Jul 08, 2018

Anurag Arnab, Ondrej Miksik, Philip H. S. Torr

Figure 1 for On the Robustness of Semantic Segmentation Models to Adversarial Attacks

Figure 2 for On the Robustness of Semantic Segmentation Models to Adversarial Attacks

Figure 3 for On the Robustness of Semantic Segmentation Models to Adversarial Attacks

Figure 4 for On the Robustness of Semantic Segmentation Models to Adversarial Attacks

Abstract:Deep Neural Networks (DNNs) have demonstrated exceptional performance on most recognition tasks such as image classification and segmentation. However, they have also been shown to be vulnerable to adversarial examples. This phenomenon has recently attracted a lot of attention but it has not been extensively studied on multiple, large-scale datasets and structured prediction tasks such as semantic segmentation which often require more specialised networks with additional components such as CRFs, dilated convolutions, skip-connections and multiscale processing. In this paper, we present what to our knowledge is the first rigorous evaluation of adversarial attacks on modern semantic segmentation models, using two large-scale datasets. We analyse the effect of different network architectures, model capacity and multiscale processing, and show that many observations made on the task of classification do not always transfer to this more complex task. Furthermore, we show how mean-field inference in deep structured models, multiscale processing (and more generally, input transformations) naturally implement recently proposed adversarial defenses. Our observations will aid future efforts in understanding and defending against adversarial examples. Moreover, in the shorter term, we show how to effectively benchmark robustness and show which segmentation models should currently be preferred in safety-critical applications due to their inherent robustness.

* CVPR 2018 extended version

Via

Access Paper or Ask Questions

Intriguing Properties of Learned Representations

Jun 11, 2018

Amartya Sanyal, Varun Kanade, Philip H. S. Torr

Figure 1 for Intriguing Properties of Learned Representations

Figure 2 for Intriguing Properties of Learned Representations

Figure 3 for Intriguing Properties of Learned Representations

Figure 4 for Intriguing Properties of Learned Representations

Abstract:A key feature of neural networks, particularly deep convolutional neural networks, is their ability to "learn" useful representations from data. The very last layer of a neural network is then simply a linear model trained on these "learned" representations. Despite their numerous applications in other tasks such as classification, retrieval, clustering etc., a.k.a. transfer learning, not much work has been published that investigates the structure of these representations or indeed whether structure can be imposed on them during the training process. In this paper, we study the effective dimensionality of the learned representations by models that have proved highly successful for image classification. We focus on ResNet-18, ResNet-50 and VGG-19 and observe that when trained on CIFAR10 or CIFAR100, the learned representations exhibit a fairly low rank structure. We propose a modification to the training procedure, which further encourages low rank structure on learned activations. Empirically, we show that this has implications for robustness to adversarial examples and compression.

Via

Access Paper or Ask Questions

Value Propagation Networks

May 28, 2018

Nantas Nardelli, Gabriel Synnaeve, Zeming Lin, Pushmeet Kohli, Philip H. S. Torr, Nicolas Usunier

Abstract:We present Value Propagation (VProp), a parameter-efficient differentiable planning module built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. Furthermore, we show that the module enables learning to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems. We evaluate on static and dynamic configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes, and on a StarCraft navigation scenario, with more complex dynamics, and pixels as input.

Via

Access Paper or Ask Questions

A Unified View of Piecewise Linear Neural Network Verification

May 22, 2018

Rudy Bunel, Ilker Turkaslan, Philip H. S. Torr, Pushmeet Kohli, M. Pawan Kumar

Figure 1 for A Unified View of Piecewise Linear Neural Network Verification

Figure 2 for A Unified View of Piecewise Linear Neural Network Verification

Figure 3 for A Unified View of Piecewise Linear Neural Network Verification

Figure 4 for A Unified View of Piecewise Linear Neural Network Verification

Abstract:The success of Deep Learning and its potential use in many safety-critical applications has motivated research on formal verification of Neural Network (NN) models. Despite the reputation of learned NN models to behave as black boxes and the theoretical hardness of proving their properties, researchers have been successful in verifying some classes of models by exploiting their piecewise linear structure and taking insights from formal methods such as Satisifiability Modulo Theory. These methods are however still far from scaling to realistic neural networks. To facilitate progress on this crucial area, we make two key contributions. First, we present a unified framework that encompasses previous methods. This analysis results in the identification of new methods that combine the strengths of multiple existing approaches, accomplishing a speedup of two orders of magnitude compared to the previous state of the art. Second, we propose a new data set of benchmarks which includes a collection of previously released testcases. We use the benchmark to provide the first experimental comparison of existing algorithms and identify the factors impacting the hardness of verification problems.

* Updated version of "Piecewise Linear Neural Network verification: A comparative study"

Via

Access Paper or Ask Questions

Meta-learning with differentiable closed-form solvers

May 21, 2018

Luca Bertinetto, João F. Henriques, Philip H. S. Torr, Andrea Vedaldi

Figure 1 for Meta-learning with differentiable closed-form solvers

Figure 2 for Meta-learning with differentiable closed-form solvers

Figure 3 for Meta-learning with differentiable closed-form solvers

Figure 4 for Meta-learning with differentiable closed-form solvers

Abstract:Adapting deep networks to new concepts from few examples is extremely challenging, due to the high computational and data requirements of standard fine-tuning procedures. Most works on meta-learning and few-shot learning have thus focused on simple learning techniques for adaptation, such as nearest neighbors or gradient descent. Nonetheless, the machine learning literature contains a wealth of methods that learn non-deep models very efficiently. In this work we propose to use these fast convergent methods as the main adaptation mechanism for few-shot learning. The main idea is to teach a deep network to use standard machine learning tools, such as logistic regression, as part of its own internal model, enabling it to quickly adapt to novel tasks. This requires back-propagating errors through the solver steps. While normally the matrix operations involved would be costly, the small number of examples works to our advantage, by making use of the Woodbury identity. We propose both iterative and closed-form solvers, based on logistic regression and ridge regression components. Our methods achieve excellent performance on three few-shot learning benchmarks, showing competitive performance on Omniglot and surpassing all state-of-the-art alternatives on miniImageNet and CIFAR-100.

Via

Access Paper or Ask Questions

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

May 21, 2018

Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson

Figure 1 for Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Figure 2 for Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Figure 3 for Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Figure 4 for Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Abstract:Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.

* Camera-ready version, International Conference of Machine Learning 2017; updated to fix print-breaking image

Via

Access Paper or Ask Questions

Learn To Pay Attention

Apr 26, 2018

Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr

Abstract:We propose an end-to-end-trainable attention module for convolutional neural network (CNN) architectures built for image classification. The module takes as input the 2D feature vector maps which form the intermediate representations of the input image at different stages in the CNN pipeline, and outputs a 2D matrix of scores for each map. Standard CNN architectures are modified through the incorporation of this module, and trained under the constraint that a convex combination of the intermediate 2D feature vectors, as parameterised by the score matrices, must \textit{alone} be used for classification. Incentivised to amplify the relevant and suppress the irrelevant or misleading, the scores thus assume the role of attention values. Our experimental observations provide clear evidence to this effect: the learned attention maps neatly highlight the regions of interest while suppressing background clutter. Consequently, the proposed function is able to bootstrap standard CNN architectures for the task of image classification, demonstrating superior generalisation over 6 unseen benchmark datasets. When binarised, our attention maps outperform other CNN-based attention maps, traditional saliency maps, and top object proposals for weakly supervised segmentation as demonstrated on the Object Discovery dataset. We also demonstrate improved robustness against the fast gradient sign method of adversarial attack.

* International Conference on Learning Representations 2018

Via

Access Paper or Ask Questions