Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

Oct 12, 2020

Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Figure 1 for Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

Figure 2 for Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

Figure 3 for Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

Figure 4 for Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

Share this with someone who'll enjoy it:

Abstract:Massively multilingual models subsuming tens or even hundreds of languages pose great challenges to multi-task optimization. While it is a common practice to apply a language-agnostic procedure optimizing a joint multilingual task objective, how to properly characterize and take advantage of its underlying problem structure for improving optimization efficiency remains under-explored. In this paper, we attempt to peek into the black-box of multilingual optimization through the lens of loss function geometry. We find that gradient similarity measured along the optimization trajectory is an important signal, which correlates well with not only language proximity but also the overall model performance. Such observation helps us to identify a critical limitation of existing gradient-based multi-task learning methods, and thus we derive a simple and scalable optimization procedure, named Gradient Vaccine, which encourages more geometrically aligned parameter updates for close tasks. Empirically, our method obtains significant model performance gains on multilingual machine translation and XTREME benchmark tasks for multilingual language models. Our work reveals the importance of properly measuring and utilizing language proximity in multilingual optimization, and has broader implications for multi-task learning beyond multilingual modeling.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

Paper and Code