Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MLE-guided parameter search for task loss minimization in neural sequence modeling

Jun 04, 2020

Sean Welleck, Kyunghyun Cho

Figure 1 for MLE-guided parameter search for task loss minimization in neural sequence modeling

Figure 2 for MLE-guided parameter search for task loss minimization in neural sequence modeling

Figure 3 for MLE-guided parameter search for task loss minimization in neural sequence modeling

Figure 4 for MLE-guided parameter search for task loss minimization in neural sequence modeling

Share this with someone who'll enjoy it:

Abstract:Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks, where they are evaluated according to sequence-level task losses. These models are typically trained with maximum likelihood estimation, which ignores the task loss, yet empirically performs well as a surrogate objective. Typical approaches to directly optimizing the task loss such as policy gradient and minimum risk training are based around sampling in the sequence space to obtain candidate update directions that are scored based on the loss of a single sequence. In this paper, we develop an alternative method based on random search in the parameter space that leverages access to the maximum likelihood gradient. We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient, with each direction weighted by its improvement in the task loss. MGS shifts sampling to the parameter space, and scores candidates using losses that are pooled from multiple sequences. Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.

View paper on

Share this with someone who'll enjoy it:

Title:MLE-guided parameter search for task loss minimization in neural sequence modeling

Paper and Code