Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Melvin Johnson

Explicit Alignment Objectives for Multilingual Bidirectional Encoders

Oct 15, 2020

Junjie Hu, Melvin Johnson, Orhan Firat, Aditya Siddhant, Graham Neubig

Figure 1 for Explicit Alignment Objectives for Multilingual Bidirectional Encoders

Figure 2 for Explicit Alignment Objectives for Multilingual Bidirectional Encoders

Figure 3 for Explicit Alignment Objectives for Multilingual Bidirectional Encoders

Figure 4 for Explicit Alignment Objectives for Multilingual Bidirectional Encoders

Abstract:Pre-trained cross-lingual encoders such as mBERT (Devlin et al., 2019) and XLMR (Conneau et al., 2020) have proven to be impressively effective at enabling transfer-learning of NLP systems from high-resource languages to low-resource languages. This success comes despite the fact that there is no explicit objective to align the contextual embeddings of words/sentences with similar meanings across languages together in the same space. In this paper, we present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bidirectional EncodeR). AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities. We conduct experiments on zero-shot cross-lingual transfer learning for different tasks including sequence tagging, sentence retrieval and sentence classification. Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model which has 4.6x the parameters of AMBER.

Via

Access Paper or Ask Questions

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Apr 10, 2020

Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson

Figure 1 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Figure 2 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Figure 3 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Figure 4 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Abstract:Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models, particularly on syntactic and sentence retrieval tasks. There is also a wide spread of results across languages. We release the benchmark to encourage research on cross-lingual learning methods that transfer linguistic knowledge across a diverse and representative set of languages and tasks.

Via

Access Paper or Ask Questions

Adaptive Scheduling for Multi-Task Learning

Sep 13, 2019

Sébastien Jean, Orhan Firat, Melvin Johnson

Figure 1 for Adaptive Scheduling for Multi-Task Learning

Figure 2 for Adaptive Scheduling for Multi-Task Learning

Figure 3 for Adaptive Scheduling for Multi-Task Learning

Figure 4 for Adaptive Scheduling for Multi-Task Learning

Abstract:To train neural machine translation models simultaneously on multiple tasks (languages), it is common to sample each task uniformly or in proportion to dataset sizes. As these methods offer little control over performance trade-offs, we explore different task scheduling approaches. We first consider existing non-adaptive techniques, then move on to adaptive schedules that over-sample tasks with poorer results compared to their respective baseline. As explicit schedules can be inefficient, especially if one task is highly over-sampled, we also consider implicit schedules, learning to scale learning rates or gradients of individual tasks instead. These techniques allow training multilingual models that perform better for low-resource language pairs (tasks with small amount of data), while minimizing negative effects on high-resource tasks.

* Continual Learning Workshop at NeurIPS 2018

Via

Access Paper or Ask Questions

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Sep 01, 2019

Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, Karthik Raman

Figure 1 for Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Figure 2 for Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Figure 3 for Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Figure 4 for Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Abstract:The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages. We compare against a strong baseline, multilingual BERT (mBERT), in different cross-lingual transfer learning scenarios and show gains in zero-shot transfer in 4 out of these 5 tasks.

Via

Access Paper or Ask Questions

Small and Practical BERT Models for Sequence Labeling

Aug 31, 2019

Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer

Figure 1 for Small and Practical BERT Models for Sequence Labeling

Figure 2 for Small and Practical BERT Models for Sequence Labeling

Figure 3 for Small and Practical BERT Models for Sequence Labeling

Figure 4 for Small and Practical BERT Models for Sequence Labeling

Abstract:We propose a practical scheme to train a single multilingual sequence labeling model that yields state of the art results and is small and fast enough to run on a single CPU. Starting from a public multilingual BERT checkpoint, our final model is 6x smaller and 27x faster, and has higher accuracy than a state-of-the-art multilingual baseline. We show that our model especially outperforms on low-resource languages, and works on codemixed input text without being explicitly trained on codemixed examples. We showcase the effectiveness of our method by reporting on part-of-speech tagging and morphological prediction on 70 treebanks and 48 languages.

* 11 pages including appendices; accepted to appear at EMNLP-IJCNLP 2019

Via

Access Paper or Ask Questions

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

Jul 11, 2019

Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry(+3 more)

Figure 1 for Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

Figure 2 for Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

Figure 3 for Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

Figure 4 for Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

Abstract:We introduce our efforts towards building a universal neural machine translation (NMT) system capable of translating between any language pair. We set a milestone towards this goal by building a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples. Our system demonstrates effective transfer learning ability, significantly improving translation quality of low-resource languages, while keeping high-resource language translation quality on-par with competitive bilingual baselines. We provide in-depth analysis of various aspects of model building that are crucial to achieving quality and practicality in universal NMT. While we prototype a high-quality universal translation system, our extensive empirical analysis exposes issues that need to be further addressed, and we suggest directions for future research.

Via

Access Paper or Ask Questions

Direct speech-to-speech translation with a sequence-to-sequence model

Apr 12, 2019

Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu

Figure 1 for Direct speech-to-speech translation with a sequence-to-sequence model

Figure 2 for Direct speech-to-speech translation with a sequence-to-sequence model

Figure 3 for Direct speech-to-speech translation with a sequence-to-sequence model

Figure 4 for Direct speech-to-speech translation with a sequence-to-sequence model

Abstract:We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice). We further demonstrate the ability to synthesize translated speech using the voice of the source speaker. We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.

* Submitted to Interspeech 2019

Via

Access Paper or Ask Questions

The Missing Ingredient in Zero-Shot Neural Machine Translation

Mar 17, 2019

Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, Wolfgang Macherey

Figure 1 for The Missing Ingredient in Zero-Shot Neural Machine Translation

Figure 2 for The Missing Ingredient in Zero-Shot Neural Machine Translation

Figure 3 for The Missing Ingredient in Zero-Shot Neural Machine Translation

Figure 4 for The Missing Ingredient in Zero-Shot Neural Machine Translation

Abstract:Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between language pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017 shared task.

Via

Access Paper or Ask Questions

Massively Multilingual Neural Machine Translation

Feb 28, 2019

Roee Aharoni, Melvin Johnson, Orhan Firat

Figure 1 for Massively Multilingual Neural Machine Translation

Figure 2 for Massively Multilingual Neural Machine Translation

Figure 3 for Massively Multilingual Neural Machine Translation

Figure 4 for Massively Multilingual Neural Machine Translation

Abstract:Multilingual neural machine translation (NMT) enables training a single model that supports translation from multiple source languages into multiple target languages. In this paper, we push the limits of multilingual NMT in terms of number of languages being used. We perform extensive experiments in training massively multilingual NMT models, translating up to 102 languages to and from English within a single model. We explore different setups for training such models and analyze the trade-offs between translation quality and various modeling decisions. We report results on the publicly available TED talks multilingual corpus where we show that massively multilingual many-to-many models are effective in low resource settings, outperforming the previous state-of-the-art while supporting up to 59 languages. Our experiments on a large-scale dataset with 102 languages to and from English and up to one million examples per direction also show promising results, surpassing strong bilingual baselines and encouraging future work on massively multilingual NMT.

* Accepted as a long paper in NAACL 2019

Via

Access Paper or Ask Questions

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Feb 21, 2019

Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu(+81 more)

Figure 1 for Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Figure 2 for Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Figure 3 for Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Abstract:Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework.

Via

Access Paper or Ask Questions