Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Philip H. S. Torr

University of Oxford

Memory-Driven Text-to-Image Generation

Aug 15, 2022

Bowen Li, Philip H. S. Torr, Thomas Lukasiewicz

Figure 1 for Memory-Driven Text-to-Image Generation

Figure 2 for Memory-Driven Text-to-Image Generation

Figure 3 for Memory-Driven Text-to-Image Generation

Figure 4 for Memory-Driven Text-to-Image Generation

Abstract:We introduce a memory-driven semi-parametric approach to text-to-image generation, which is based on both parametric and non-parametric techniques. The non-parametric component is a memory bank of image features constructed from a training set of images. The parametric component is a generative adversarial network. Given a new text description at inference time, the memory bank is used to selectively retrieve image features that are provided as basic information of target images, which enables the generator to produce realistic synthetic results. We also incorporate the content information into the discriminator, together with semantic features, allowing the discriminator to make a more reliable prediction. Experimental results demonstrate that the proposed memory-driven semi-parametric approach produces more realistic images than purely parametric approaches, in terms of both visual fidelity and text-image semantic consistency.

Via

Access Paper or Ask Questions

An Impartial Take to the CNN vs Transformer Robustness Contest

Jul 22, 2022

Francesco Pinto, Philip H. S. Torr, Puneet K. Dokania

Figure 1 for An Impartial Take to the CNN vs Transformer Robustness Contest

Figure 2 for An Impartial Take to the CNN vs Transformer Robustness Contest

Figure 3 for An Impartial Take to the CNN vs Transformer Robustness Contest

Figure 4 for An Impartial Take to the CNN vs Transformer Robustness Contest

Abstract:Following the surge of popularity of Transformers in Computer Vision, several studies have attempted to determine whether they could be more robust to distribution shifts and provide better uncertainty estimates than Convolutional Neural Networks (CNNs). The almost unanimous conclusion is that they are, and it is often conjectured more or less explicitly that the reason of this supposed superiority is to be attributed to the self-attention mechanism. In this paper we perform extensive empirical analyses showing that recent state-of-the-art CNNs (particularly, ConvNeXt) can be as robust and reliable or even sometimes more than the current state-of-the-art Transformers. However, there is no clear winner. Therefore, although it is tempting to state the definitive superiority of one family of architectures over another, they seem to enjoy similar extraordinary performances on a variety of tasks while also suffering from similar vulnerabilities such as texture, background, and simplicity biases.

* ECCV 2022

Via

Access Paper or Ask Questions

Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Jul 22, 2022

Tom Joy, Francesco Pinto, Ser-Nam Lim, Philip H. S. Torr, Puneet K. Dokania

Figure 1 for Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Figure 2 for Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Figure 3 for Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Figure 4 for Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Abstract:It is now well known that neural networks can be wrong with high confidence in their predictions, leading to poor calibration. The most common post-hoc approach to compensate for this is to perform temperature scaling, which adjusts the confidences of the predictions on any input by scaling the logits by a fixed value. Whilst this approach typically improves the average calibration across the whole test dataset, this improvement typically reduces the individual confidences of the predictions irrespective of whether the classification of a given input is correct or incorrect. With this insight, we base our method on the observation that different samples contribute to the calibration error by varying amounts, with some needing to increase their confidence and others needing to decrease it. Therefore, for each input, we propose to predict a different temperature value, allowing us to adjust the mismatch between confidence and accuracy at a finer granularity. Furthermore, we observe improved results on OOD detection and can also extract a notion of hardness for the data-points. Our method is applied post-hoc, consequently using very little computation time and with a negligible memory footprint and is applied to off-the-shelf pre-trained classifiers. We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets, showing that producing per-data-point temperatures is beneficial also for the expected calibration error across the whole test set. Code is available at: https://github.com/thwjoy/adats.

Via

Access Paper or Ask Questions

Illusionary Attacks on Sequential Decision Makers and Countermeasures

Jul 20, 2022

Tim Franzmeyer, João F. Henriques, Jakob N. Foerster, Philip H. S. Torr, Adel Bibi, Christian Schroeder de Witt

Figure 1 for Illusionary Attacks on Sequential Decision Makers and Countermeasures

Figure 2 for Illusionary Attacks on Sequential Decision Makers and Countermeasures

Figure 3 for Illusionary Attacks on Sequential Decision Makers and Countermeasures

Figure 4 for Illusionary Attacks on Sequential Decision Makers and Countermeasures

Abstract:Autonomous intelligent agents deployed to the real-world need to be robust against adversarial attacks on sensory inputs. Existing work in reinforcement learning focuses on minimum-norm perturbation attacks, which were originally introduced to mimic a notion of perceptual invariance in computer vision. In this paper, we note that such minimum-norm perturbation attacks can be trivially detected by victim agents, as these result in observation sequences that are not consistent with the victim agent's actions. Furthermore, many real-world agents, such as physical robots, commonly operate under human supervisors, which are not susceptible to such perturbation attacks. As a result, we propose to instead focus on illusionary attacks, a novel form of attack that is consistent with the world model of the victim agent. We provide a formal definition of this novel attack framework, explore its characteristics under a variety of conditions, and conclude that agents must seek realism feedback to be robust to illusionary attacks.

Via

Access Paper or Ask Questions

SiamMask: A Framework for Fast Online Object Tracking and Segmentation

Jul 05, 2022

Weiming Hu, Qiang Wang, Li Zhang, Luca Bertinetto, Philip H. S. Torr

Figure 1 for SiamMask: A Framework for Fast Online Object Tracking and Segmentation

Figure 2 for SiamMask: A Framework for Fast Online Object Tracking and Segmentation

Figure 3 for SiamMask: A Framework for Fast Online Object Tracking and Segmentation

Figure 4 for SiamMask: A Framework for Fast Online Object Tracking and Segmentation

Abstract:In this paper we introduce SiamMask, a framework to perform both visual object tracking and video object segmentation, in real-time, with the same simple method. We improve the offline training procedure of popular fully-convolutional Siamese approaches by augmenting their losses with a binary segmentation task. Once the offline training is completed, SiamMask only requires a single bounding box for initialization and can simultaneously carry out visual object tracking and segmentation at high frame-rates. Moreover, we show that it is possible to extend the framework to handle multiple object tracking and segmentation by simply re-using the multi-task model in a cascaded fashion. Experimental results show that our approach has high processing efficiency, at around 55 frames per second. It yields real-time state-of-the-art results on visual-object tracking benchmarks, while at the same time demonstrating competitive performance at a high speed for video object segmentation benchmarks.

* IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2022
* 17 pages, Accepted by TPAMI 2022. arXiv admin note: substantial text overlap with arXiv:1812.05050

Via

Access Paper or Ask Questions

RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

Jun 29, 2022

Francesco Pinto, Harry Yang, Ser-Nam Lim, Philip H. S. Torr, Puneet K. Dokania

Figure 1 for RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

Figure 2 for RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

Figure 3 for RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

Figure 4 for RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

Abstract:We show that the effectiveness of the well celebrated Mixup [Zhang et al., 2018] can be further improved if instead of using it as the sole learning objective, it is utilized as an additional regularizer to the standard cross-entropy loss. This simple change not only provides much improved accuracy but also significantly improves the quality of the predictive uncertainty estimation of Mixup in most cases under various forms of covariate shifts and out-of-distribution detection experiments. In fact, we observe that Mixup yields much degraded performance on detecting out-of-distribution samples possibly, as we show empirically, because of its tendency to learn models that exhibit high-entropy throughout; making it difficult to differentiate in-distribution samples from out-distribution ones. To show the efficacy of our approach (RegMixup), we provide thorough analyses and experiments on vision datasets (ImageNet & CIFAR-10/100) and compare it with a suite of recent approaches for reliable uncertainty estimation.

* 22 pages, 18 figures

Via

Access Paper or Ask Questions

How robust are pre-trained models to distribution shift?

Jun 17, 2022

Yuge Shi, Imant Daunhawer, Julia E. Vogt, Philip H. S. Torr, Amartya Sanyal

Figure 1 for How robust are pre-trained models to distribution shift?

Figure 2 for How robust are pre-trained models to distribution shift?

Figure 3 for How robust are pre-trained models to distribution shift?

Figure 4 for How robust are pre-trained models to distribution shift?

Abstract:The vulnerability of machine learning models to spurious correlations has mostly been discussed in the context of supervised learning (SL). However, there is a lack of insight on how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE). In this work, we shed light on this by evaluating the performance of these models on both real world and synthetic distribution shift datasets. Following observations that the linear head itself can be susceptible to spurious correlations, we develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation. With this new methodology, we show that SSL models are consistently more robust to distribution shifts and thus better at OOD generalisation than AE and SL models.

Via

Access Paper or Ask Questions

Catastrophic overfitting is a bug but also a feature

Jun 16, 2022

Guillermo Ortiz-Jiménez, Pau de Jorge, Amartya Sanyal, Adel Bibi, Puneet K. Dokania, Pascal Frossard, Gregory Rogéz, Philip H. S. Torr

Figure 1 for Catastrophic overfitting is a bug but also a feature

Figure 2 for Catastrophic overfitting is a bug but also a feature

Figure 3 for Catastrophic overfitting is a bug but also a feature

Figure 4 for Catastrophic overfitting is a bug but also a feature

Abstract:Despite clear computational advantages in building robust neural networks, adversarial training (AT) using single-step methods is unstable as it suffers from catastrophic overfitting (CO): Networks gain non-trivial robustness during the first stages of adversarial training, but suddenly reach a breaking point where they quickly lose all robustness in just a few iterations. Although some works have succeeded at preventing CO, the different mechanisms that lead to this remarkable failure mode are still poorly understood. In this work, however, we find that the interplay between the structure of the data and the dynamics of AT plays a fundamental role in CO. Specifically, through active interventions on typical datasets of natural images, we establish a causal link between the structure of the data and the onset of CO in single-step AT methods. This new perspective provides important insights into the mechanisms that lead to CO and paves the way towards a better understanding of the general dynamics of robust model construction. The code to reproduce the experiments of this paper can be found at https://github.com/gortizji/co_features .

* 24 pages, 19 figures, 1 table. Work partially presented at Adversarial Machine Learning Workshop at ICML 2022

Via

Access Paper or Ask Questions

Zero-Shot Logit Adjustment

May 05, 2022

Dubing Chen, Yuming Shen, Haofeng Zhang, Philip H. S. Torr

Abstract:Semantic-descriptor-based Generalized Zero-Shot Learning (GZSL) poses challenges in recognizing novel classes in the test phase. The development of generative models enables current GZSL techniques to probe further into the semantic-visual link, culminating in a two-stage form that includes a generator and a classifier. However, existing generation-based methods focus on enhancing the generator's effect while neglecting the improvement of the classifier. In this paper, we first analyze of two properties of the generated pseudo unseen samples: bias and homogeneity. Then, we perform variational Bayesian inference to back-derive the evaluation metrics, which reflects the balance of the seen and unseen classes. As a consequence of our derivation, the aforementioned two properties are incorporated into the classifier training as seen-unseen priors via logit adjustment. The Zero-Shot Logit Adjustment further puts semantic-based classifiers into effect in generation-based GZSL. Our experiments demonstrate that the proposed technique achieves state-of-the-art when combined with the basic generator, and it can improve various generative Zero-Shot Learning frameworks. Our codes are available on https://github.com/cdb342/IJCAI-2022-ZLA.

* IJCAI 2022

Via

Access Paper or Ask Questions

Towards the Semantic Weak Generalization Problem in Generative Zero-Shot Learning: Ante-hoc and Post-hoc

Apr 24, 2022

Dubing Chen, Yuming Shen, Haofeng Zhang, Philip H. S. Torr

Figure 1 for Towards the Semantic Weak Generalization Problem in Generative Zero-Shot Learning: Ante-hoc and Post-hoc

Figure 2 for Towards the Semantic Weak Generalization Problem in Generative Zero-Shot Learning: Ante-hoc and Post-hoc

Figure 3 for Towards the Semantic Weak Generalization Problem in Generative Zero-Shot Learning: Ante-hoc and Post-hoc

Figure 4 for Towards the Semantic Weak Generalization Problem in Generative Zero-Shot Learning: Ante-hoc and Post-hoc

Abstract:In this paper, we present a simple and effective strategy lowering the previously unexplored factors that limit the performance ceiling of generative Zero-Shot Learning (ZSL). We begin by formally defining semantic generalization, then look into approaches for reducing the semantic weak generalization problem and minimizing its negative influence on classifier training. In the ante-hoc phase, we augment the generator's semantic input, as well as relax the fitting target of the generator. In the post-hoc phase (after generating simulated unseen samples), we derive from the gradient of the loss function to minimize the gradient increment on seen classifier weights carried by biased unseen distribution, which tends to cause misleading on intra-seen class decision boundaries. Without complicated designs, our approach hit the essential problem and significantly outperform the state-of-the-art on four widely used ZSL datasets.

Via

Access Paper or Ask Questions