Alert button
Picture for Ioannis Mitliagkas

Ioannis Mitliagkas

Alert button

Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection

Aug 22, 2023
Charles Guille-Escuret, Pierre-André Noël, Ioannis Mitliagkas, David Vazquez, Joao Monteiro

Improving the reliability of deployed machine learning systems often involves developing methods to detect out-of-distribution (OOD) inputs. However, existing research often narrowly focuses on samples from classes that are absent from the training set, neglecting other types of plausible distribution shifts. This limitation reduces the applicability of these methods in real-world scenarios, where systems encounter a wide variety of anomalous inputs. In this study, we categorize five distinct types of distribution shifts and critically evaluate the performance of recent OOD detection methods on each of them. We publicly release our benchmark under the name BROAD (Benchmarking Resilience Over Anomaly Diversity). Our findings reveal that while these methods excel in detecting unknown classes, their performance is inconsistent when encountering other types of distribution shifts. In other words, they only reliably detect unexpected inputs that they have been specifically designed to expect. As a first step toward broad OOD detection, we learn a generative model of existing detection scores with a Gaussian mixture. By doing so, we present an ensemble approach that offers a more consistent and comprehensive solution for broad OOD detection, demonstrating superior performance compared to existing methods. Our code to download BROAD and reproduce our experiments is publicly available.

Viaarxiv icon

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

Jul 05, 2023
Sébastien Lachapelle, Divyat Mahajan, Ioannis Mitliagkas, Simon Lacoste-Julien

We tackle the problems of latent variables identification and "out-of-support" image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.

* 35 pages 
Viaarxiv icon

No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths

Jun 20, 2023
Charles Guille-Escuret, Hiroki Naganuma, Kilian Fatras, Ioannis Mitliagkas

Figure 1 for No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths
Figure 2 for No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths
Figure 3 for No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths
Figure 4 for No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths

Understanding the optimization dynamics of neural networks is necessary for closing the gap between theory and practice. Stochastic first-order optimization algorithms are known to efficiently locate favorable minima in deep neural networks. This efficiency, however, contrasts with the non-convex and seemingly complex structure of neural loss landscapes. In this study, we delve into the fundamental geometric properties of sampled gradients along optimization paths. We focus on two key quantities, which appear in the restricted secant inequality and error bound. Both hold high significance for first-order optimization. Our analysis reveals that these quantities exhibit predictable, consistent behavior throughout training, despite the stochasticity induced by sampling minibatches. Our findings suggest that not only do optimization trajectories never encounter significant obstacles, but they also maintain stable dynamics during the majority of training. These observed properties are sufficiently expressive to theoretically guarantee linear convergence and prescribe learning rate schedules mirroring empirical practices. We conduct our experiments on image classification, semantic segmentation and language modeling across different batch sizes, network architectures, datasets, optimizers, and initialization seeds. We discuss the impact of each factor. Our work provides novel insights into the properties of neural network loss functions, and opens the door to theoretical frameworks more relevant to prevalent practice.

Viaarxiv icon

Performative Prediction with Neural Networks

Apr 14, 2023
Mehrnaz Mofakhami, Ioannis Mitliagkas, Gauthier Gidel

Figure 1 for Performative Prediction with Neural Networks

Performative prediction is a framework for learning models that influence the data they intend to predict. We focus on finding classifiers that are performatively stable, i.e. optimal for the data distribution they induce. Standard convergence results for finding a performatively stable classifier with the method of repeated risk minimization assume that the data distribution is Lipschitz continuous to the model's parameters. Under this assumption, the loss must be strongly convex and smooth in these parameters; otherwise, the method will diverge for some problems. In this work, we instead assume that the data distribution is Lipschitz continuous with respect to the model's predictions, a more natural assumption for performative systems. As a result, we are able to significantly relax the assumptions on the loss function. In particular, we do not need to assume convexity with respect to the model's parameters. As an illustration, we introduce a resampling procedure that models realistic distribution shifts and show that it satisfies our assumptions. We support our theory by showing that one can learn performatively stable classifiers with neural networks making predictions about real data that shift according to our proposed procedure.

* Published at AISTATS 2023 
Viaarxiv icon

Synergies Between Disentanglement and Sparsity: a Multi-Task Learning Perspective

Nov 26, 2022
Sébastien Lachapelle, Tristan Deleu, Divyat Mahajan, Ioannis Mitliagkas, Yoshua Bengio, Simon Lacoste-Julien, Quentin Bertrand

Figure 1 for Synergies Between Disentanglement and Sparsity: a Multi-Task Learning Perspective
Figure 2 for Synergies Between Disentanglement and Sparsity: a Multi-Task Learning Perspective
Figure 3 for Synergies Between Disentanglement and Sparsity: a Multi-Task Learning Perspective
Figure 4 for Synergies Between Disentanglement and Sparsity: a Multi-Task Learning Perspective

Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem. Finally, we explore a meta-learning version of this algorithm based on group Lasso multiclass SVM base-predictors, for which we derive a tractable dual formulation. It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations.

* 36 pages 
Viaarxiv icon

Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Nov 18, 2022
Hiroki Naganuma, Kartik Ahuja, Shiro Takagi, Tetsuya Motokawa, Rio Yokota, Kohta Ishikawa, Ikuro Sato, Ioannis Mitliagkas

Figure 1 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Figure 2 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Figure 3 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Figure 4 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Modern deep learning systems are fragile and do not generalize well under distribution shifts. While much promising work has been accomplished to address these concerns, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address the problem settings for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as out-of-distribution datasets for the exhaustive study. We search over a wide range of hyperparameters and examine the classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings: i) contrary to conventional wisdom, adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum-based SGD), ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset - linear returns, increasing returns, and diminishing returns. We believe these findings can help practitioners choose the right optimizer and know what behavior to expect.

* NeurIPS2022 Workshop on Distribution Shifts (DistShift) 
Viaarxiv icon

Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation

Nov 03, 2022
Divyat Mahajan, Ioannis Mitliagkas, Brady Neal, Vasilis Syrgkanis

Figure 1 for Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation
Figure 2 for Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation
Figure 3 for Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation
Figure 4 for Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation

We study the problem of model selection in causal inference, specifically for the case of conditional average treatment effect (CATE) estimation under binary treatments. Unlike model selection in machine learning, we cannot use the technique of cross-validation here as we do not observe the counterfactual potential outcome for any data point. Hence, we need to design model selection techniques that do not explicitly rely on counterfactual data. As an alternative to cross-validation, there have been a variety of proxy metrics proposed in the literature, that depend on auxiliary nuisance models also estimated from the data (propensity score model, outcome regression model). However, the effectiveness of these metrics has only been studied on synthetic datasets as we can observe the counterfactual data for them. We conduct an extensive empirical analysis to judge the performance of these metrics, where we utilize the latest advances in generative modeling to incorporate multiple realistic datasets. We evaluate 9 metrics on 144 datasets for selecting between 415 estimators per dataset, including datasets that closely mimic real-world datasets. Further, we use the latest techniques from AutoML to ensure consistent hyperparameter selection for nuisance models for a fair comparison across metrics.

* Preprint. Under Review 
Viaarxiv icon

Towards Out-of-Distribution Adversarial Robustness

Oct 10, 2022
Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan

Figure 1 for Towards Out-of-Distribution Adversarial Robustness
Figure 2 for Towards Out-of-Distribution Adversarial Robustness
Figure 3 for Towards Out-of-Distribution Adversarial Robustness
Figure 4 for Towards Out-of-Distribution Adversarial Robustness

Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fails to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different $L_p$ norms, we show that there is potential for improvement against many commonly used attacks by adopting a domain generalisation approach. Concretely, we treat each type of attack as a domain, and apply the Risk Extrapolation method (REx), which promotes similar levels of robustness against all training attacks. Compared to existing methods, we obtain similar or superior worst-case adversarial robustness on attacks seen during training. Moreover, we achieve superior performance on families or tunings of attacks only encountered at test time. On ensembles of attacks, our approach improves the accuracy from 3.4% the best existing baseline to 25.9% on MNIST, and from 16.9% to 23.5% on CIFAR10.

* Under review ICLR 2023 
Viaarxiv icon

CADet: Fully Self-Supervised Anomaly Detection With Contrastive Learning

Oct 04, 2022
Charles Guille-Escuret, Pau Rodriguez, David Vazquez, Ioannis Mitliagkas, Joao Monteiro

Figure 1 for CADet: Fully Self-Supervised Anomaly Detection With Contrastive Learning
Figure 2 for CADet: Fully Self-Supervised Anomaly Detection With Contrastive Learning
Figure 3 for CADet: Fully Self-Supervised Anomaly Detection With Contrastive Learning
Figure 4 for CADet: Fully Self-Supervised Anomaly Detection With Contrastive Learning

Handling out-of-distribution (OOD) samples has become a major stake in the real-world deployment of machine learning systems. This work explores the application of self-supervised contrastive learning to the simultaneous detection of two types of OOD samples: unseen classes and adversarial perturbations. Since in practice the distribution of such samples is not known in advance, we do not assume access to OOD examples. We show that similarity functions trained with contrastive learning can be leveraged with the maximum mean discrepancy (MMD) two-sample test to verify whether two independent sets of samples are drawn from the same distribution. Inspired by this approach, we introduce CADet (Contrastive Anomaly Detection), a method based on image augmentations to perform anomaly detection on single samples. CADet compares favorably to adversarial detection methods to detect adversarially perturbed samples on ImageNet. Simultaneously, it achieves comparable performance to unseen label detection methods on two challenging benchmarks: ImageNet-O and iNaturalist. CADet is fully self-supervised and requires neither labels for in-distribution samples nor access to OOD examples.

Viaarxiv icon

A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods

Oct 03, 2022
Tiago Salvador, Kilian Fatras, Ioannis Mitliagkas, Adam Oberman

Figure 1 for A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods
Figure 2 for A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods
Figure 3 for A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods
Figure 4 for A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods

Unsupervised Domain Adaptation (UDA) aims at classifying unlabeled target images leveraging source labeled ones. In this work, we consider the Partial Domain Adaptation (PDA) variant, where we have extra source classes not present in the target domain. Most successful algorithms use model selection strategies that rely on target labels to find the best hyper-parameters and/or models along training. However, these strategies violate the main assumption in PDA: only unlabeled target domain samples are available. Moreover, there are also inconsistencies in the experimental settings - architecture, hyper-parameter tuning, number of runs - yielding unfair comparisons. The main goal of this work is to provide a realistic evaluation of PDA methods with the different model selection strategies under a consistent evaluation protocol. We evaluate 7 representative PDA algorithms on 2 different real-world datasets using 7 different model selection strategies. Our two main findings are: (i) without target labels for model selection, the accuracy of the methods decreases up to 30 percentage points; (ii) only one method and model selection pair performs well on both datasets. Experiments were performed with our PyTorch framework, BenchmarkPDA, which we open source.

* 17 pages, 13 tables 
Viaarxiv icon