Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

Mar 25, 2019

Hao Fu*, Chunyuan Li*, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin

Figure 1 for Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

Figure 2 for Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

Figure 3 for Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

Figure 4 for Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

Share this with someone who'll enjoy it:

Abstract:Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter \beta. One notorious training difficulty is that the KL term tends to vanish. In this paper we study scheduling schemes for \beta, and show that KL vanishing is caused by the lack of good latent codes in training the decoder at the beginning of optimization. To remedy this, we propose a cyclical annealing schedule, which repeats the process of increasing \beta multiple times. This new procedure allows the progressive learning of more meaningful latent codes, by leveraging the informative representations of previous cycles as warm re-starts. The effectiveness of cyclical annealing is validated on a broad range of NLP tasks, including language modeling, dialog response generation and unsupervised language pre-training.

* Published in NAACL 2019; The first two authors contribute equally; Code: https://github.com/haofuml/cyclical_annealing

View paper on

Share this with someone who'll enjoy it:

Title:Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

Paper and Code