Alert button
Picture for Mingtian Zhang

Mingtian Zhang

Alert button

AFEC: Active Forgetting of Negative Transfer in Continual Learning

Oct 23, 2021
Liyuan Wang, Mingtian Zhang, Zhongfan Jia, Qian Li, Chenglong Bao, Kaisheng Ma, Jun Zhu, Yi Zhong

Figure 1 for AFEC: Active Forgetting of Negative Transfer in Continual Learning
Figure 2 for AFEC: Active Forgetting of Negative Transfer in Continual Learning
Figure 3 for AFEC: Active Forgetting of Negative Transfer in Continual Learning
Figure 4 for AFEC: Active Forgetting of Negative Transfer in Continual Learning

Continual learning aims to learn a sequence of tasks from dynamic data distributions. Without accessing to the old training samples, knowledge transfer from the old tasks to each new task is difficult to determine, which might be either positive or negative. If the old knowledge interferes with the learning of a new task, i.e., the forward knowledge transfer is negative, then precisely remembering the old tasks will further aggravate the interference, thus decreasing the performance of continual learning. By contrast, biological neural networks can actively forget the old knowledge that conflicts with the learning of a new experience, through regulating the learning-triggered synaptic expansion and synaptic convergence. Inspired by the biological active forgetting, we propose to actively forget the old knowledge that limits the learning of new tasks to benefit continual learning. Under the framework of Bayesian continual learning, we develop a novel approach named Active Forgetting with synaptic Expansion-Convergence (AFEC). Our method dynamically expands parameters to learn each new task and then selectively combines them, which is formally consistent with the underlying mechanism of biological active forgetting. We extensively evaluate AFEC on a variety of continual learning benchmarks, including CIFAR-10 regression tasks, visual classification tasks and Atari reinforcement tasks, where AFEC effectively improves the learning of new tasks and achieves the state-of-the-art performance in a plug-and-play way.

* 35th Conference on Neural Information Processing Systems (NeurIPS 2021)  
Viaarxiv icon

Flow Based Models For Manifold Data

Sep 29, 2021
Mingtian Zhang, Yitong Sun, Steven McDonagh, Chen Zhang

Figure 1 for Flow Based Models For Manifold Data
Figure 2 for Flow Based Models For Manifold Data
Figure 3 for Flow Based Models For Manifold Data
Figure 4 for Flow Based Models For Manifold Data

Flow-based generative models typically define a latent space with dimensionality identical to the observational space. In many problems, however, the data does not populate the full ambient data-space that they natively reside in, rather inhabiting a lower-dimensional manifold. In such scenarios, flow-based models are unable to represent data structures exactly as their density will always have support off the data manifold, potentially resulting in degradation of model performance. In addition, the requirement for equal latent and data space dimensionality can unnecessarily increase complexity for contemporary flow models. Towards addressing these problems, we propose to learn a manifold prior that affords benefits to both sample generation and representation quality. An auxiliary benefit of our approach is the ability to identify the intrinsic dimension of the data distribution.

Viaarxiv icon

On the Out-of-distribution Generalization of Probabilistic Image Modelling

Sep 04, 2021
Mingtian Zhang, Andi Zhang, Steven McDonagh

Figure 1 for On the Out-of-distribution Generalization of Probabilistic Image Modelling
Figure 2 for On the Out-of-distribution Generalization of Probabilistic Image Modelling
Figure 3 for On the Out-of-distribution Generalization of Probabilistic Image Modelling
Figure 4 for On the Out-of-distribution Generalization of Probabilistic Image Modelling

Out-of-distribution (OOD) detection and lossless compression constitute two problems that can be solved by the training of probabilistic models on a first dataset with subsequent likelihood evaluation on a second dataset, where data distributions differ. By defining the generalization of probabilistic models in terms of likelihood we show that, in the case of image models, the OOD generalization ability is dominated by local features. This motivates our proposal of a Local Autoregressive model that exclusively models local image features towards improving OOD performance. We apply the proposed model to OOD detection tasks and achieve state-of-the-art unsupervised OOD detection performance without the introduction of additional data. Additionally, we employ our model to build a new lossless image compressor: NeLLoC (Neural Local Lossless Compressor) and report state-of-the-art compression rates and model size.

Viaarxiv icon

Wasserstein Robust Reinforcement Learning

Sep 16, 2019
Mohammed Amin Abdullah, Hang Ren, Haitham Bou Ammar, Vladimir Milenkovic, Rui Luo, Mingtian Zhang, Jun Wang

Figure 1 for Wasserstein Robust Reinforcement Learning
Figure 2 for Wasserstein Robust Reinforcement Learning
Figure 3 for Wasserstein Robust Reinforcement Learning
Figure 4 for Wasserstein Robust Reinforcement Learning

Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes $\text{W}\text{R}^{2}\text{L}$ -- a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. Apart from the formulation, we also propose an efficient and scalable solver following a novel zero-order optimisation method that we believe can be useful to numerical optimisation in general. We empirically demonstrate significant gains compared to standard and robust state-of-the-art algorithms on high-dimensional MuJuCo environments.

Viaarxiv icon

Variational f-divergence Minimization

Jul 27, 2019
Mingtian Zhang, Thomas Bird, Raza Habib, Tianlin Xu, David Barber

Figure 1 for Variational f-divergence Minimization
Figure 2 for Variational f-divergence Minimization
Figure 3 for Variational f-divergence Minimization
Figure 4 for Variational f-divergence Minimization

Probabilistic models are often trained by maximum likelihood, which corresponds to minimizing a specific f-divergence between the model and data distribution. In light of recent successes in training Generative Adversarial Networks, alternative non-likelihood training criteria have been proposed. Whilst not necessarily statistically efficient, these alternatives may better match user requirements such as sharp image generation. A general variational method for training probabilistic latent variable models using maximum likelihood is well established; however, how to train latent variable models using other f-divergences is comparatively unknown. We discuss a variational approach that, when combined with the recently introduced Spread Divergence, can be applied to train a large class of latent variable models using any f-divergence.

Viaarxiv icon

Spread Divergences

Dec 02, 2018
David Barber, Mingtian Zhang, Raza Habib, Thomas Bird

Figure 1 for Spread Divergences
Figure 2 for Spread Divergences
Figure 3 for Spread Divergences
Figure 4 for Spread Divergences

For distributions p and q with different support, the divergence generally will not exist. We define a spread divergence on modified p and q and describe sufficient conditions for the existence of such a divergence. We give examples of using a spread divergence to train implicit generative models, including linear models (Principal Components Analysis and Independent Components Analysis) and non-linear models (Deep Generative Networks).

Viaarxiv icon