Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Puneet K. Dokania

Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Jul 22, 2022
Tom Joy, Francesco Pinto, Ser-Nam Lim, Philip H. S. Torr, Puneet K. Dokania

Figure 1 for Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Figure 2 for Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Figure 3 for Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Figure 4 for Sample-dependent Adaptive Temperature Scaling for Improved Calibration

It is now well known that neural networks can be wrong with high confidence in their predictions, leading to poor calibration. The most common post-hoc approach to compensate for this is to perform temperature scaling, which adjusts the confidences of the predictions on any input by scaling the logits by a fixed value. Whilst this approach typically improves the average calibration across the whole test dataset, this improvement typically reduces the individual confidences of the predictions irrespective of whether the classification of a given input is correct or incorrect. With this insight, we base our method on the observation that different samples contribute to the calibration error by varying amounts, with some needing to increase their confidence and others needing to decrease it. Therefore, for each input, we propose to predict a different temperature value, allowing us to adjust the mismatch between confidence and accuracy at a finer granularity. Furthermore, we observe improved results on OOD detection and can also extract a notion of hardness for the data-points. Our method is applied post-hoc, consequently using very little computation time and with a negligible memory footprint and is applied to off-the-shelf pre-trained classifiers. We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets, showing that producing per-data-point temperatures is beneficial also for the expected calibration error across the whole test set. Code is available at: https://github.com/thwjoy/adats.

Via

Access Paper or Ask Questions

RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

Jun 29, 2022
Francesco Pinto, Harry Yang, Ser-Nam Lim, Philip H. S. Torr, Puneet K. Dokania

Figure 1 for RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

Figure 2 for RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

Figure 3 for RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

Figure 4 for RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

We show that the effectiveness of the well celebrated Mixup [Zhang et al., 2018] can be further improved if instead of using it as the sole learning objective, it is utilized as an additional regularizer to the standard cross-entropy loss. This simple change not only provides much improved accuracy but also significantly improves the quality of the predictive uncertainty estimation of Mixup in most cases under various forms of covariate shifts and out-of-distribution detection experiments. In fact, we observe that Mixup yields much degraded performance on detecting out-of-distribution samples possibly, as we show empirically, because of its tendency to learn models that exhibit high-entropy throughout; making it difficult to differentiate in-distribution samples from out-distribution ones. To show the efficacy of our approach (RegMixup), we provide thorough analyses and experiments on vision datasets (ImageNet & CIFAR-10/100) and compare it with a suite of recent approaches for reliable uncertainty estimation.

* 22 pages, 18 figures

Via

Access Paper or Ask Questions

Catastrophic overfitting is a bug but also a feature

Jun 16, 2022
Guillermo Ortiz-Jiménez, Pau de Jorge, Amartya Sanyal, Adel Bibi, Puneet K. Dokania, Pascal Frossard, Gregory Rogéz, Philip H. S. Torr

Figure 1 for Catastrophic overfitting is a bug but also a feature

Figure 2 for Catastrophic overfitting is a bug but also a feature

Figure 3 for Catastrophic overfitting is a bug but also a feature

Figure 4 for Catastrophic overfitting is a bug but also a feature

Despite clear computational advantages in building robust neural networks, adversarial training (AT) using single-step methods is unstable as it suffers from catastrophic overfitting (CO): Networks gain non-trivial robustness during the first stages of adversarial training, but suddenly reach a breaking point where they quickly lose all robustness in just a few iterations. Although some works have succeeded at preventing CO, the different mechanisms that lead to this remarkable failure mode are still poorly understood. In this work, however, we find that the interplay between the structure of the data and the dynamics of AT plays a fundamental role in CO. Specifically, through active interventions on typical datasets of natural images, we establish a causal link between the structure of the data and the onset of CO in single-step AT methods. This new perspective provides important insights into the mechanisms that lead to CO and paves the way towards a better understanding of the general dynamics of robust model construction. The code to reproduce the experiments of this paper can be found at https://github.com/gortizji/co_features .

* 24 pages, 19 figures, 1 table. Work partially presented at Adversarial Machine Learning Workshop at ICML 2022

Via

Access Paper or Ask Questions

Make Some Noise: Reliable and Efficient Single-Step Adversarial Training

Feb 02, 2022
Pau de Jorge, Adel Bibi, Riccardo Volpi, Amartya Sanyal, Philip H. S. Torr, Grégory Rogez, Puneet K. Dokania

Figure 1 for Make Some Noise: Reliable and Efficient Single-Step Adversarial Training

Figure 2 for Make Some Noise: Reliable and Efficient Single-Step Adversarial Training

Figure 3 for Make Some Noise: Reliable and Efficient Single-Step Adversarial Training

Figure 4 for Make Some Noise: Reliable and Efficient Single-Step Adversarial Training

Recently, Wong et al. showed that adversarial training with single-step FGSM leads to a characteristic failure mode named catastrophic overfitting (CO), in which a model becomes suddenly vulnerable to multi-step attacks. They showed that adding a random perturbation prior to FGSM (RS-FGSM) seemed to be sufficient to prevent CO. However, Andriushchenko and Flammarion observed that RS-FGSM still leads to CO for larger perturbations, and proposed an expensive regularizer (GradAlign) to avoid CO. In this work, we methodically revisit the role of noise and clipping in single-step adversarial training. Contrary to previous intuitions, we find that using a stronger noise around the clean sample combined with not clipping is highly effective in avoiding CO for large perturbation radii. Based on these observations, we then propose Noise-FGSM (N-FGSM) that, while providing the benefits of single-step adversarial training, does not suffer from CO. Empirical analyses on a large suite of experiments show that N-FGSM is able to match or surpass the performance of previous single-step methods while achieving a 3$\times$ speed-up.

Via

Access Paper or Ask Questions

ANCER: Anisotropic Certification via Sample-wise Volume Maximization

Jul 20, 2021
Francisco Eiras, Motasem Alfarra, M. Pawan Kumar, Philip H. S. Torr, Puneet K. Dokania, Bernard Ghanem, Adel Bibi

Figure 1 for ANCER: Anisotropic Certification via Sample-wise Volume Maximization

Figure 2 for ANCER: Anisotropic Certification via Sample-wise Volume Maximization

Figure 3 for ANCER: Anisotropic Certification via Sample-wise Volume Maximization

Figure 4 for ANCER: Anisotropic Certification via Sample-wise Volume Maximization

Randomized smoothing has recently emerged as an effective tool that enables certification of deep neural network classifiers at scale. All prior art on randomized smoothing has focused on isotropic $\ell_p$ certification, which has the advantage of yielding certificates that can be easily compared among isotropic methods via $\ell_p$-norm radius. However, isotropic certification limits the region that can be certified around an input to worst-case adversaries, i.e., it cannot reason about other "close", potentially large, constant prediction safe regions. To alleviate this issue, (i) we theoretically extend the isotropic randomized smoothing $\ell_1$ and $\ell_2$ certificates to their generalized anisotropic counterparts following a simplified analysis. Moreover, (ii) we propose evaluation metrics allowing for the comparison of general certificates - a certificate is superior to another if it certifies a superset region - with the quantification of each certificate through the volume of the certified region. We introduce ANCER, a practical framework for obtaining anisotropic certificates for a given test set sample via volume maximization. Our empirical results demonstrate that ANCER achieves state-of-the-art $\ell_1$ and $\ell_2$ certified accuracy on both CIFAR-10 and ImageNet at multiple radii, while certifying substantially larger regions in terms of volume, thus highlighting the benefits of moving away from isotropic analysis. Code used in our experiments is available in https://github.com/MotasemAlfarra/ANCER.

* First two authors and the last one contributed equally to this work

Via

Access Paper or Ask Questions

No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks

Apr 01, 2021
Shyamgopal Karthik, Ameya Prabhu, Puneet K. Dokania, Vineet Gandhi

Figure 1 for No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks

Figure 2 for No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks

Figure 3 for No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks

Figure 4 for No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks

There has been increasing interest in building deep hierarchy-aware classifiers that aim to quantify and reduce the severity of mistakes, and not just reduce the number of errors. The idea is to exploit the label hierarchy (e.g., the WordNet ontology) and consider graph distances as a proxy for mistake severity. Surprisingly, on examining mistake-severity distributions of the top-1 prediction, we find that current state-of-the-art hierarchy-aware deep classifiers do not always show practical improvement over the standard cross-entropy baseline in making better mistakes. The reason for the reduction in average mistake-severity can be attributed to the increase in low-severity mistakes, which may also explain the noticeable drop in their accuracy. To this end, we use the classical Conditional Risk Minimization (CRM) framework for hierarchy-aware classification. Given a cost matrix and a reliable estimate of likelihoods (obtained from a trained network), CRM simply amends mistakes at inference time; it needs no extra hyperparameters and requires adding just a few lines of code to the standard cross-entropy baseline. It significantly outperforms the state-of-the-art and consistently obtains large reductions in the average hierarchical distance of top-$k$ predictions across datasets, with very little loss in accuracy. CRM, because of its simplicity, can be used with any off-the-shelf trained model that provides reliable likelihood estimates.

Via

Access Paper or Ask Questions

On Batch Normalisation for Approximate Bayesian Inference

Dec 24, 2020
Jishnu Mukhoti, Puneet K. Dokania, Philip H. S. Torr, Yarin Gal

Figure 1 for On Batch Normalisation for Approximate Bayesian Inference

Figure 2 for On Batch Normalisation for Approximate Bayesian Inference

We study batch normalisation in the context of variational inference methods in Bayesian neural networks, such as mean-field or MC Dropout. We show that batch-normalisation does not affect the optimum of the evidence lower bound (ELBO). Furthermore, we study the Monte Carlo Batch Normalisation (MCBN) algorithm, proposed as an approximate inference technique parallel to MC Dropout, and show that for larger batch sizes, MCBN fails to capture epistemic uncertainty. Finally, we provide insights into what is required to fix this failure, namely having to view the mini-batch size as a variational parameter in MCBN. We comment on the asymptotics of the ELBO with respect to this variational parameter, showing that as dataset size increases towards infinity, the batch-size must increase towards infinity as well for MCBN to be a valid approximate inference technique.

Via

Access Paper or Ask Questions

Continual Learning in Low-rank Orthogonal Subspaces

Oct 22, 2020
Arslan Chaudhry, Naeemullah Khan, Puneet K. Dokania, Philip H. S. Torr

Figure 1 for Continual Learning in Low-rank Orthogonal Subspaces

Figure 2 for Continual Learning in Low-rank Orthogonal Subspaces

Figure 3 for Continual Learning in Low-rank Orthogonal Subspaces

Figure 4 for Continual Learning in Low-rank Orthogonal Subspaces

In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the continual learning experience is finished. The prior art in CL uses episodic memory, parameter regularization or extensible network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space. We believe this invariably leads to interference among different tasks. We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference. Further, to keep the gradients of different tasks coming from these subspaces orthogonal to each other, we learn isometric mappings by posing network training as an optimization problem over the Stiefel manifold. To the best of our understanding, we report, for the first time, strong results over experience-replay baseline with and without memory on standard classification benchmarks in continual learning. The code is made publicly available.

* NeurIPS, 2020
* The paper is accepted at NeurIPS'20

Via

Access Paper or Ask Questions