Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingtian Zhang

Integrated Weak Learning

Jun 19, 2022

Peter Hayes, Mingtian Zhang, Raza Habib, Jordan Burgess, Emine Yilmaz, David Barber

Abstract:We introduce Integrated Weak Learning, a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple sources of weak supervision. We introduce a label model that can learn to aggregate weak supervision sources differently for different datapoints and takes into consideration the performance of the end-model during training. We show that our approach outperforms existing weak learning techniques across a set of 6 benchmark classification datasets. When both a small amount of labeled data and weak supervision are present the increase in performance is both consistent and large, reliably getting a 2-5 point test F1 score gain over non-integrated methods.

* 14 pages, 4 figures

Via

Access Paper or Ask Questions

Out-of-Distribution Detection with Class Ratio Estimation

Jun 08, 2022

Mingtian Zhang, Andi Zhang, Tim Z. Xiao, Yitong Sun, Steven McDonagh

Figure 1 for Out-of-Distribution Detection with Class Ratio Estimation

Figure 2 for Out-of-Distribution Detection with Class Ratio Estimation

Figure 3 for Out-of-Distribution Detection with Class Ratio Estimation

Figure 4 for Out-of-Distribution Detection with Class Ratio Estimation

Abstract:Density-based Out-of-distribution (OOD) detection has recently been shown unreliable for the task of detecting OOD images. Various density ratio based approaches achieve good empirical performance, however methods typically lack a principled probabilistic modelling explanation. In this work, we propose to unify density ratio based methods under a novel framework that builds energy-based models and employs differing base distributions. Under our framework, the density ratio can be viewed as the unnormalized density of an implicit semantic distribution. Further, we propose to directly estimate the density ratio of a data sample through class ratio estimation. We report competitive results on OOD image problems in comparison with recent work that alternatively requires training of deep generative models for the task. Our approach enables a simple and yet effective path towards solving the OOD detection problem.

Via

Access Paper or Ask Questions

Improving VAE-based Representation Learning

May 28, 2022

Mingtian Zhang, Tim Z. Xiao, Brooks Paige, David Barber

Figure 1 for Improving VAE-based Representation Learning

Figure 2 for Improving VAE-based Representation Learning

Figure 3 for Improving VAE-based Representation Learning

Figure 4 for Improving VAE-based Representation Learning

Abstract:Latent variable models like the Variational Auto-Encoder (VAE) are commonly used to learn representations of images. However, for downstream tasks like semantic classification, the representations learned by VAE are less competitive than other non-latent variable models. This has led to some speculations that latent variable models may be fundamentally unsuitable for representation learning. In this work, we study what properties are required for good representations and how different VAE structure choices could affect the learned properties. We show that by using a decoder that prefers to learn local features, the remaining global features can be well captured by the latent, which significantly improves performance of a downstream classification task. We further apply the proposed model to semi-supervised learning tasks and demonstrate improvements in data efficiency.

Via

Access Paper or Ask Questions

Generalization Gap in Amortized Inference

May 23, 2022

Mingtian Zhang, Peter Hayes, David Barber

Figure 1 for Generalization Gap in Amortized Inference

Figure 2 for Generalization Gap in Amortized Inference

Abstract:The ability of likelihood-based probabilistic models to generalize to unseen data is central to many machine learning applications such as lossless compression. In this work, we study the generalizations of a popular class of probabilistic models - the Variational Auto-Encoder (VAE). We point out the two generalization gaps that can affect the generalization ability of VAEs and show that the over-fitting phenomenon is usually dominated by the amortized inference network. Based on this observation we propose a new training objective, inspired by the classic wake-sleep algorithm, to improve the generalizations properties of amortized inference. We also demonstrate how it can improve generalization performance in the context of image modeling and lossless compression.

Via

Access Paper or Ask Questions

Parallel Neural Local Lossless Compression

Jan 23, 2022

Mingtian Zhang, James Townsend, Ning Kang, David Barber

Figure 1 for Parallel Neural Local Lossless Compression

Figure 2 for Parallel Neural Local Lossless Compression

Figure 3 for Parallel Neural Local Lossless Compression

Abstract:The recently proposed Neural Local Lossless Compression (NeLLoC), which is based on a local autoregressive model, has achieved state-of-the-art (SOTA) out-of-distribution (OOD) generalization performance in the image compression task. In addition to the encouragement of OOD generalization, the local model also allows parallel inference in the decoding stage. In this paper, we propose a parallelization scheme for local autoregressive models. We discuss the practicalities of implementing this scheme, and provide experimental evidence of significant gains in compression runtime compared to the previous, non-parallel implementation.

Via

Access Paper or Ask Questions

AFEC: Active Forgetting of Negative Transfer in Continual Learning

Nov 04, 2021

Liyuan Wang, Mingtian Zhang, Zhongfan Jia, Qian Li, Chenglong Bao, Kaisheng Ma, Jun Zhu, Yi Zhong

Figure 1 for AFEC: Active Forgetting of Negative Transfer in Continual Learning

Figure 2 for AFEC: Active Forgetting of Negative Transfer in Continual Learning

Figure 3 for AFEC: Active Forgetting of Negative Transfer in Continual Learning

Figure 4 for AFEC: Active Forgetting of Negative Transfer in Continual Learning

Abstract:Continual learning aims to learn a sequence of tasks from dynamic data distributions. Without accessing to the old training samples, knowledge transfer from the old tasks to each new task is difficult to determine, which might be either positive or negative. If the old knowledge interferes with the learning of a new task, i.e., the forward knowledge transfer is negative, then precisely remembering the old tasks will further aggravate the interference, thus decreasing the performance of continual learning. By contrast, biological neural networks can actively forget the old knowledge that conflicts with the learning of a new experience, through regulating the learning-triggered synaptic expansion and synaptic convergence. Inspired by the biological active forgetting, we propose to actively forget the old knowledge that limits the learning of new tasks to benefit continual learning. Under the framework of Bayesian continual learning, we develop a novel approach named Active Forgetting with synaptic Expansion-Convergence (AFEC). Our method dynamically expands parameters to learn each new task and then selectively combines them, which is formally consistent with the underlying mechanism of biological active forgetting. We extensively evaluate AFEC on a variety of continual learning benchmarks, including CIFAR-10 regression tasks, visual classification tasks and Atari reinforcement tasks, where AFEC effectively improves the learning of new tasks and achieves the state-of-the-art performance in a plug-and-play way.

* 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

Via

Access Paper or Ask Questions

Flow Based Models For Manifold Data

Sep 29, 2021

Mingtian Zhang, Yitong Sun, Steven McDonagh, Chen Zhang

Figure 1 for Flow Based Models For Manifold Data

Figure 2 for Flow Based Models For Manifold Data

Figure 3 for Flow Based Models For Manifold Data

Figure 4 for Flow Based Models For Manifold Data

Abstract:Flow-based generative models typically define a latent space with dimensionality identical to the observational space. In many problems, however, the data does not populate the full ambient data-space that they natively reside in, rather inhabiting a lower-dimensional manifold. In such scenarios, flow-based models are unable to represent data structures exactly as their density will always have support off the data manifold, potentially resulting in degradation of model performance. In addition, the requirement for equal latent and data space dimensionality can unnecessarily increase complexity for contemporary flow models. Towards addressing these problems, we propose to learn a manifold prior that affords benefits to both sample generation and representation quality. An auxiliary benefit of our approach is the ability to identify the intrinsic dimension of the data distribution.

Via

Access Paper or Ask Questions

On the Out-of-distribution Generalization of Probabilistic Image Modelling

Sep 04, 2021

Mingtian Zhang, Andi Zhang, Steven McDonagh

Figure 1 for On the Out-of-distribution Generalization of Probabilistic Image Modelling

Figure 2 for On the Out-of-distribution Generalization of Probabilistic Image Modelling

Figure 3 for On the Out-of-distribution Generalization of Probabilistic Image Modelling

Figure 4 for On the Out-of-distribution Generalization of Probabilistic Image Modelling

Abstract:Out-of-distribution (OOD) detection and lossless compression constitute two problems that can be solved by the training of probabilistic models on a first dataset with subsequent likelihood evaluation on a second dataset, where data distributions differ. By defining the generalization of probabilistic models in terms of likelihood we show that, in the case of image models, the OOD generalization ability is dominated by local features. This motivates our proposal of a Local Autoregressive model that exclusively models local image features towards improving OOD performance. We apply the proposed model to OOD detection tasks and achieve state-of-the-art unsupervised OOD detection performance without the introduction of additional data. Additionally, we employ our model to build a new lossless image compressor: NeLLoC (Neural Local Lossless Compressor) and report state-of-the-art compression rates and model size.

Via

Access Paper or Ask Questions

Wasserstein Robust Reinforcement Learning

Sep 16, 2019

Mohammed Amin Abdullah, Hang Ren, Haitham Bou Ammar, Vladimir Milenkovic, Rui Luo, Mingtian Zhang, Jun Wang

Figure 1 for Wasserstein Robust Reinforcement Learning

Figure 2 for Wasserstein Robust Reinforcement Learning

Figure 3 for Wasserstein Robust Reinforcement Learning

Figure 4 for Wasserstein Robust Reinforcement Learning

Abstract:Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes $\text{W}\text{R}^{2}\text{L}$ -- a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. Apart from the formulation, we also propose an efficient and scalable solver following a novel zero-order optimisation method that we believe can be useful to numerical optimisation in general. We empirically demonstrate significant gains compared to standard and robust state-of-the-art algorithms on high-dimensional MuJuCo environments.

Via

Access Paper or Ask Questions

Variational f-divergence Minimization

Jul 27, 2019

Mingtian Zhang, Thomas Bird, Raza Habib, Tianlin Xu, David Barber

Figure 1 for Variational f-divergence Minimization

Figure 2 for Variational f-divergence Minimization

Figure 3 for Variational f-divergence Minimization

Figure 4 for Variational f-divergence Minimization

Abstract:Probabilistic models are often trained by maximum likelihood, which corresponds to minimizing a specific f-divergence between the model and data distribution. In light of recent successes in training Generative Adversarial Networks, alternative non-likelihood training criteria have been proposed. Whilst not necessarily statistically efficient, these alternatives may better match user requirements such as sharp image generation. A general variational method for training probabilistic latent variable models using maximum likelihood is well established; however, how to train latent variable models using other f-divergences is comparatively unknown. We discuss a variational approach that, when combined with the recently introduced Spread Divergence, can be applied to train a large class of latent variable models using any f-divergence.

Via

Access Paper or Ask Questions