Alert button
Picture for Miguel Rodrigues

Miguel Rodrigues

Alert button

Impact of Noise on Calibration and Generalisation of Neural Networks

Jun 30, 2023
Martin Ferianc, Ondrej Bohdal, Timothy Hospedales, Miguel Rodrigues

Figure 1 for Impact of Noise on Calibration and Generalisation of Neural Networks
Figure 2 for Impact of Noise on Calibration and Generalisation of Neural Networks
Figure 3 for Impact of Noise on Calibration and Generalisation of Neural Networks
Figure 4 for Impact of Noise on Calibration and Generalisation of Neural Networks

Noise injection and data augmentation strategies have been effective for enhancing the generalisation and robustness of neural networks (NNs). Certain types of noise such as label smoothing and MixUp have also been shown to improve calibration. Since noise can be added in various stages of the NN's training, it motivates the question of when and where the noise is the most effective. We study a variety of noise types to determine how much they improve calibration and generalisation, and under what conditions. More specifically we evaluate various noise-injection strategies in both in-distribution (ID) and out-of-distribution (OOD) scenarios. The findings highlight that activation noise was the most transferable and effective in improving generalisation, while input augmentation noise was prominent in improving calibration on OOD but not necessarily ID data.

* Accepted at the ICML 2023 Workshop on Spurious Correlations, Invariance, and Stability. Martin and Ondrej contributed equally 
Viaarxiv icon

An information-Theoretic Approach to Semi-supervised Transfer Learning

Jun 11, 2023
Daniel Jakubovitz, David Uliel, Miguel Rodrigues, Raja Giryes

Figure 1 for An information-Theoretic Approach to Semi-supervised Transfer Learning
Figure 2 for An information-Theoretic Approach to Semi-supervised Transfer Learning
Figure 3 for An information-Theoretic Approach to Semi-supervised Transfer Learning
Figure 4 for An information-Theoretic Approach to Semi-supervised Transfer Learning

Transfer learning is a valuable tool in deep learning as it allows propagating information from one "source dataset" to another "target dataset", especially in the case of a small number of training examples in the latter. Yet, discrepancies between the underlying distributions of the source and target data are commonplace and are known to have a substantial impact on algorithm performance. In this work we suggest novel information-theoretic approaches for the analysis of the performance of deep neural networks in the context of transfer learning. We focus on the task of semi-supervised transfer learning, in which unlabeled samples from the target dataset are available during network training on the source dataset. Our theory suggests that one may improve the transferability of a deep neural network by incorporating regularization terms on the target data based on information-theoretic quantities, namely the Mutual Information and the Lautum Information. We demonstrate the effectiveness of the proposed approaches in various semi-supervised transfer learning experiments.

* arXiv admin note: substantial text overlap with arXiv:1904.01670 
Viaarxiv icon

On the Generalization Error of Meta Learning for the Gibbs Algorithm

Apr 27, 2023
Yuheng Bu, Harsha Vardhan Tetali, Gholamali Aminian, Miguel Rodrigues, Gregory Wornell

Figure 1 for On the Generalization Error of Meta Learning for the Gibbs Algorithm
Figure 2 for On the Generalization Error of Meta Learning for the Gibbs Algorithm

We analyze the generalization ability of joint-training meta learning algorithms via the Gibbs algorithm. Our exact characterization of the expected meta generalization error for the meta Gibbs algorithm is based on symmetrized KL information, which measures the dependence between all meta-training datasets and the output parameters, including task-specific and meta parameters. Additionally, we derive an exact characterization of the meta generalization error for the super-task Gibbs algorithm, in terms of conditional symmetrized KL information within the super-sample and super-task framework introduced in Steinke and Zakynthinou (2020) and Hellstrom and Durisi (2022) respectively. Our results also enable us to provide novel distribution-free generalization error upper bounds for these Gibbs algorithms applicable to meta learning.

* Accepted at ISIT 2023 
Viaarxiv icon

How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?

Oct 15, 2022
Haiyun He, Gholamali Aminian, Yuheng Bu, Miguel Rodrigues, Vincent Y. F. Tan

Figure 1 for How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?
Figure 2 for How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?
Figure 3 for How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?

This paper provides an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm. This characterization is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset. It can be applied to obtain distribution-free upper and lower bounds on the gen-error. Our findings offer new insights that the generalization performance of SSL with pseudo-labeling is affected not only by the information between the output hypothesis and input training data but also by the information {\em shared} between the {\em labeled} and {\em pseudo-labeled} data samples. To deepen our understanding, we further explore two examples -- mean estimation and logistic regression. In particular, we analyze how the ratio of the number of unlabeled to labeled data $\lambda$ affects the gen-error under both scenarios. As $\lambda$ increases, the gen-error for mean estimation decreases and then saturates at a value larger than when all the samples are labeled, and the gap can be quantified {\em exactly} with our analysis, and is dependent on the \emph{cross-covariance} between the labeled and pseudo-labeled data sample. In logistic regression, the gen-error and the variance component of the excess risk also decrease as $\lambda$ increases.

* 29 pages, 8 figures 
Viaarxiv icon

Semi-Counterfactual Risk Minimization Via Neural Networks

Sep 28, 2022
Gholamali Aminian, Roberto Vega, Omar Rivasplata, Laura Toni, Miguel Rodrigues

Figure 1 for Semi-Counterfactual Risk Minimization Via Neural Networks
Figure 2 for Semi-Counterfactual Risk Minimization Via Neural Networks
Figure 3 for Semi-Counterfactual Risk Minimization Via Neural Networks
Figure 4 for Semi-Counterfactual Risk Minimization Via Neural Networks

Counterfactual risk minimization is a framework for offline policy optimization with logged data which consists of context, action, propensity score, and reward for each sample point. In this work, we build on this framework and propose a learning method for settings where the rewards for some samples are not observed, and so the logged data consists of a subset of samples with unknown rewards and a subset of samples with known rewards. This setting arises in many application domains, including advertising and healthcare. While reward feedback is missing for some samples, it is possible to leverage the unknown-reward samples in order to minimize the risk, and we refer to this setting as semi-counterfactual risk minimization. To approach this kind of learning problem, we derive new upper bounds on the true risk under the inverse propensity score estimator. We then build upon these bounds to propose a regularized counterfactual risk minimization method, where the regularization term is based on the logged unknown-rewards dataset only; hence it is reward-independent. We also propose another algorithm based on generating pseudo-rewards for the logged unknown-rewards dataset. Experimental results with neural networks and benchmark datasets indicate that these algorithms can leverage the logged unknown-rewards dataset besides the logged known-reward dataset.

* Accepted in EWRL 2022 
Viaarxiv icon

Simple Regularisation for Uncertainty-Aware Knowledge Distillation

May 19, 2022
Martin Ferianc, Miguel Rodrigues

Figure 1 for Simple Regularisation for Uncertainty-Aware Knowledge Distillation
Figure 2 for Simple Regularisation for Uncertainty-Aware Knowledge Distillation
Figure 3 for Simple Regularisation for Uncertainty-Aware Knowledge Distillation
Figure 4 for Simple Regularisation for Uncertainty-Aware Knowledge Distillation

Considering uncertainty estimation of modern neural networks (NNs) is one of the most important steps towards deploying machine learning systems to meaningful real-world applications such as in medicine, finance or autonomous systems. At the moment, ensembles of different NNs constitute the state-of-the-art in both accuracy and uncertainty estimation in different tasks. However, ensembles of NNs are unpractical under real-world constraints, since their computation and memory consumption scale linearly with the size of the ensemble, which increase their latency and deployment cost. In this work, we examine a simple regularisation approach for distribution-free knowledge distillation of ensemble of machine learning models into a single NN. The aim of the regularisation is to preserve the diversity, accuracy and uncertainty estimation characteristics of the original ensemble without any intricacies, such as fine-tuning. We demonstrate the generality of the approach on combinations of toy data, SVHN/CIFAR-10, simple to complex NN architectures and different tasks.

* Accepted to the ICML 2022 Workshop on Distribution-Free Uncertainty Quantification. The code can be found at: https://github.com/martinferianc/hydra_plus 
Viaarxiv icon

Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Feb 24, 2022
Gholamali Aminian, Yuheng Bu, Gregory Wornell, Miguel Rodrigues

Figure 1 for Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Generalization error bounds are essential to understanding machine learning algorithms. This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample. Multiple generalization error upper bounds based on different information measures are provided, including Wasserstein distance, total variation distance, KL divergence, and Jensen-Shannon divergence. Due to the convexity of the information measures, the proposed bounds in terms of Wasserstein distance and total variation distance are shown to be tighter than their counterparts based on individual samples in the literature. An example is provided to demonstrate the tightness of the proposed generalization error bounds.

* 10 pages, 1 figure 
Viaarxiv icon

Minimax Demographic Group Fairness in Federated Learning

Jan 25, 2022
Afroditi Papadaki, Natalia Martinez, Martin Bertran, Guillermo Sapiro, Miguel Rodrigues

Figure 1 for Minimax Demographic Group Fairness in Federated Learning
Figure 2 for Minimax Demographic Group Fairness in Federated Learning
Figure 3 for Minimax Demographic Group Fairness in Federated Learning
Figure 4 for Minimax Demographic Group Fairness in Federated Learning

Federated learning is an increasingly popular paradigm that enables a large number of entities to collaboratively learn better models. In this work, we study minimax group fairness in federated learning scenarios where different participating entities may only have access to a subset of the population groups during the training phase. We formally analyze how our proposed group fairness objective differs from existing federated learning fairness criteria that impose similar performance across participants instead of demographic groups. We provide an optimization algorithm -- FedMinMax -- for solving the proposed problem that provably enjoys the performance guarantees of centralized learning algorithms. We experimentally compare the proposed approach against other state-of-the-art methods in terms of group fairness in various federated learning setups, showing that our approach exhibits competitive or superior performance.

* arXiv admin note: substantial text overlap with arXiv:2110.01999 
Viaarxiv icon

Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm

Nov 02, 2021
Yuheng Bu, Gholamali Aminian, Laura Toni, Miguel Rodrigues, Gregory Wornell

Figure 1 for Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm
Figure 2 for Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm
Figure 3 for Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm
Figure 4 for Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm

We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, $\alpha$-weighted-ERM and two-stage-ERM. Our key result is an exact characterization of the generalization behaviour using the conditional symmetrized KL information between the output hypothesis and the target training samples given the source samples. Our results can also be applied to provide novel distribution-free generalization error upper bounds on these two aforementioned Gibbs algorithms. Our approach is versatile, as it also characterizes the generalization errors and excess risks of these two Gibbs algorithms in the asymptotic regime, where they converge to the $\alpha$-weighted-ERM and two-stage-ERM, respectively. Based on our theoretical results, we show that the benefits of transfer learning can be viewed as a bias-variance trade-off, with the bias induced by the source distribution and the variance induced by the lack of target samples. We believe this viewpoint can guide the choice of transfer learning algorithms in practice.

Viaarxiv icon