Alert button
Picture for Yingzhen Li

Yingzhen Li

Alert button

Training Discrete Energy-Based Models with Energy Discrepancy

Jul 14, 2023
Tobias Schröder, Zijing Ou, Yingzhen Li, Andrew B. Duncan

Figure 1 for Training Discrete Energy-Based Models with Energy Discrepancy
Figure 2 for Training Discrete Energy-Based Models with Energy Discrepancy
Figure 3 for Training Discrete Energy-Based Models with Energy Discrepancy
Figure 4 for Training Discrete Energy-Based Models with Energy Discrepancy

Training energy-based models (EBMs) on discrete spaces is challenging because sampling over such spaces can be difficult. We propose to train discrete EBMs with energy discrepancy (ED), a novel type of contrastive loss functional which only requires the evaluation of the energy function at data points and their perturbed counter parts, thus not relying on sampling strategies like Markov chain Monte Carlo (MCMC). Energy discrepancy offers theoretical guarantees for a broad class of perturbation processes of which we investigate three types: perturbations based on Bernoulli noise, based on deterministic transforms, and based on neighbourhood structures. We demonstrate their relative performance on lattice Ising models, binary synthetic data, and discrete image data sets.

* Presented at ICML 2023 Workshop: Sampling and Optimization in Discrete Space (SODS 2023) 
Viaarxiv icon

Energy Discrepancies: A Score-Independent Loss for Energy-Based Models

Jul 12, 2023
Tobias Schröder, Zijing Ou, Jen Ning Lim, Yingzhen Li, Sebastian J. Vollmer, Andrew B. Duncan

Figure 1 for Energy Discrepancies: A Score-Independent Loss for Energy-Based Models
Figure 2 for Energy Discrepancies: A Score-Independent Loss for Energy-Based Models
Figure 3 for Energy Discrepancies: A Score-Independent Loss for Energy-Based Models
Figure 4 for Energy Discrepancies: A Score-Independent Loss for Energy-Based Models

Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.

Viaarxiv icon

On the Identifiability of Markov Switching Models

May 26, 2023
Carles Balsells-Rodas, Yixin Wang, Yingzhen Li

Figure 1 for On the Identifiability of Markov Switching Models
Figure 2 for On the Identifiability of Markov Switching Models
Figure 3 for On the Identifiability of Markov Switching Models
Figure 4 for On the Identifiability of Markov Switching Models

Identifiability of latent variable models has recently gained interest in terms of its applications to interpretability or out of distribution generalisation. In this work, we study identifiability of Markov Switching Models as a first step towards extending recent results to sequential latent variable models. We present identifiability conditions within first-order Markov dependency structures, and parametrise the transition distribution via non-linear Gaussians. Our experiments showcase the applicability of our approach for regime-dependent causal discovery and high-dimensional time series segmentation.

Viaarxiv icon

ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure

Mar 04, 2023
Hee Suk Yoon, Joshua Tian Jin Tee, Eunseop Yoon, Sunjae Yoon, Gwangsu Kim, Yingzhen Li, Chang D. Yoo

Figure 1 for ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure
Figure 2 for ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure
Figure 3 for ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure
Figure 4 for ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure

Studies have shown that modern neural networks tend to be poorly calibrated due to over-confident predictions. Traditionally, post-processing methods have been used to calibrate the model after training. In recent years, various trainable calibration measures have been proposed to incorporate them directly into the training process. However, these methods all incorporate internal hyperparameters, and the performance of these calibration objectives relies on tuning these hyperparameters, incurring more computational costs as the size of neural networks and datasets become larger. As such, we present Expected Squared Difference (ESD), a tuning-free (i.e., hyperparameter-free) trainable calibration objective loss, where we view the calibration error from the perspective of the squared difference between the two expectations. With extensive experiments on several architectures (CNNs, Transformers) and datasets, we demonstrate that (1) incorporating ESD into the training improves model calibration in various batch size settings without the need for internal hyperparameter tuning, (2) ESD yields the best-calibrated results compared with previous approaches, and (3) ESD drastically improves the computational costs required for calibration during training due to the absence of internal hyperparameter. The code is publicly accessible at https://github.com/hee-suk-yoon/ESD.

* ICLR 2023 
Viaarxiv icon

Calibrating Transformers via Sparse Gaussian Processes

Mar 04, 2023
Wenlong Chen, Yingzhen Li

Figure 1 for Calibrating Transformers via Sparse Gaussian Processes
Figure 2 for Calibrating Transformers via Sparse Gaussian Processes
Figure 3 for Calibrating Transformers via Sparse Gaussian Processes
Figure 4 for Calibrating Transformers via Sparse Gaussian Processes

Transformer models have achieved profound success in prediction tasks in a wide range of applications in natural language processing, speech recognition and computer vision. Extending Transformer's success to safety-critical domains requires calibrated uncertainty estimation which remains under-explored. To address this, we propose Sparse Gaussian Process attention (SGPA), which performs Bayesian inference directly in the output space of multi-head attention blocks (MHAs) in transformer to calibrate its uncertainty. It replaces the scaled dot-product operation with a valid symmetric kernel and uses sparse Gaussian processes (SGP) techniques to approximate the posterior processes of MHA outputs. Empirically, on a suite of prediction tasks on text, images and graphs, SGPA-based Transformers achieve competitive predictive accuracy, while noticeably improving both in-distribution calibration and out-of-distribution robustness and detection.

* Accepted for publication at The Eleventh International Conference on Learning Representations (ICLR 2023) 
Viaarxiv icon

Scalable Infomin Learning

Feb 21, 2023
Yanzhi Chen, Weihao Sun, Yingzhen Li, Adrian Weller

Figure 1 for Scalable Infomin Learning
Figure 2 for Scalable Infomin Learning
Figure 3 for Scalable Infomin Learning
Figure 4 for Scalable Infomin Learning

The task of infomin learning aims to learn a representation with high utility while being uninformative about a specified target, with the latter achieved by minimising the mutual information between the representation and the target. It has broad applications, ranging from training fair prediction models against protected attributes, to unsupervised learning with disentangled representations. Recent works on infomin learning mainly use adversarial training, which involves training a neural network to estimate mutual information or its proxy and thus is slow and difficult to optimise. Drawing on recent advances in slicing techniques, we propose a new infomin learning approach, which uses a novel proxy metric to mutual information. We further derive an accurate and analytically computable approximation to this proxy metric, thereby removing the need of constructing neural network-based mutual information estimators. Experiments on algorithmic fairness, disentangled representation learning and domain adaptation verify that our method can effectively remove unwanted information with limited time budget.

* 10 pages, accepted to NeurIPS 2022, slightly improved version 
Viaarxiv icon

Markovian Gaussian Process Variational Autoencoders

Jul 12, 2022
Harrison Zhu, Carles Balsells Rodas, Yingzhen Li

Figure 1 for Markovian Gaussian Process Variational Autoencoders

Deep generative models are widely used for modelling high-dimensional time series, such as video animations, audio and climate data. Sequential variational autoencoders have been successfully considered for many applications, with many variant models relying on discrete-time methods and recurrent neural networks (RNNs). On the other hand, continuous-time methods have recently gained attraction, especially in the context of irregularly-sampled time series, where they can better handle the data than discrete-time methods. One such class are Gaussian process variational autoencoders (GPVAEs), where the VAE prior is set as a Gaussian process (GPs), allowing inductive biases to be explicitly encoded via the kernel function and interpretability of the latent space. However, a major limitation of GPVAEs is that it inherits the same cubic computational cost as GPs. In this work, we leverage the equivalent discrete state space representation of Markovian GPs to enable a linear-time GP solver via Kalman filtering and smoothing. We show via corrupt and missing frames tasks that our method performs favourably, especially on the latter where it outperforms RNN-based models.

* Non-archival paper presented at Workshop on Continuous Time Methods for Machine Learning. The 39th International Conference on Machine Learning, Baltimore 
Viaarxiv icon

Repairing Neural Networks by Leaving the Right Past Behind

Jul 11, 2022
Ryutaro Tanno, Melanie F. Pradier, Aditya Nori, Yingzhen Li

Figure 1 for Repairing Neural Networks by Leaving the Right Past Behind
Figure 2 for Repairing Neural Networks by Leaving the Right Past Behind
Figure 3 for Repairing Neural Networks by Leaving the Right Past Behind
Figure 4 for Repairing Neural Networks by Leaving the Right Past Behind

Prediction failures of machine learning models often arise from deficiencies in training data, such as incorrect labels, outliers, and selection biases. However, such data points that are responsible for a given failure mode are generally not known a priori, let alone a mechanism for repairing the failure. This work draws on the Bayesian view of continual learning, and develops a generic framework for both, identifying training examples that have given rise to the target failure, and fixing the model through erasing information about them. This framework naturally allows leveraging recent advances in continual learning to this new problem of model repairment, while subsuming the existing works on influence functions and data deletion as specific instances. Experimentally, the proposed approach outperforms the baselines for both identification of detrimental training data and fixing model failures in a generalisable manner.

* 19 pages, 9 figures 
Viaarxiv icon

Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference

Mar 03, 2022
Zijing Ou, Tingyang Xu, Qinliang Su, Yingzhen Li, Peilin Zhao, Yatao Bian

Figure 1 for Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference
Figure 2 for Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference
Figure 3 for Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference
Figure 4 for Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference

Learning set functions becomes increasingly more important in many applications like product recommendation and compound selection in AI-aided drug discovery. The majority of existing works study methodologies of set function learning under the function value oracle, which, however, requires expensive supervision signals. This renders it impractical for applications with only weak supervisions under the Optimal Subset (OS) oracle, the study of which is surprisingly overlooked. In this work, we present a principled yet practical maximum likelihood learning framework, termed as EquiVSet, that simultaneously meets the following desiderata of learning set functions under the OS oracle: i) permutation invariance of the set mass function being modeled; ii) permission of varying ground set; iii) fully differentiability; iv) minimum prior; and v) scalability. The main components of our framework involve: an energy-based treatment of the set mass function, DeepSet-style architectures to handle permutation invariance, mean-field variational inference, and its amortized variants. Although the framework is embarrassingly simple, empirical studies on three real-world applications (including Amazon product recommendation, set anomaly detection and compound selection for virtual screening) demonstrate that EquiVSet outperforms the baselines by a large margin.

Viaarxiv icon

A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging

Sep 25, 2021
Thomas Henn, Yasukazu Sakamoto, Clément Jacquet, Shunsuke Yoshizawa, Masamichi Andou, Stephen Tchen, Ryosuke Saga, Hiroyuki Ishihara, Katsuhiko Shimizu, Yingzhen Li, Ryutaro Tanno

Figure 1 for A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging
Figure 2 for A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging
Figure 3 for A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging
Figure 4 for A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging

Machine learning models commonly exhibit unexpected failures post-deployment due to either data shifts or uncommon situations in the training environment. Domain experts typically go through the tedious process of inspecting the failure cases manually, identifying failure modes and then attempting to fix the model. In this work, we aim to standardise and bring principles to this process through answering two critical questions: (i) how do we know that we have identified meaningful and distinct failure types?; (ii) how can we validate that a model has, indeed, been repaired? We suggest that the quality of the identified failure types can be validated through measuring the intra- and inter-type generalisation after fine-tuning and introduce metrics to compare different subtyping methods. Furthermore, we argue that a model can be considered repaired if it achieves high accuracy on the failure types while retaining performance on the previously correct data. We combine these two ideas into a principled framework for evaluating the quality of both the identified failure subtypes and model repairment. We evaluate its utility on a classification and an object detection tasks. Our code is available at https://github.com/Rokken-lab6/Failure-Analysis-and-Model-Repairment

* Medical Image Computing and Computer Assisted Intervention MICCAI 2021 pp 509-518  
Viaarxiv icon