Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guillaume Wenzek

NLLB Team

Facebook AI's WAT19 Myanmar-English Translation Task Submission

Oct 15, 2019

Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc'Aurelio Ranzato

Figure 1 for Facebook AI's WAT19 Myanmar-English Translation Task Submission

Figure 2 for Facebook AI's WAT19 Myanmar-English Translation Task Submission

Figure 3 for Facebook AI's WAT19 Myanmar-English Translation Task Submission

Figure 4 for Facebook AI's WAT19 Myanmar-English Translation Task Submission

Abstract:This paper describes Facebook AI's submission to the WAT 2019 Myanmar-English translation task. Our baseline systems are BPE-based transformer models. We explore methods to leverage monolingual data to improve generalization, including self-training, back-translation and their combination. We further improve results by using noisy channel re-ranking and ensembling. We demonstrate that these techniques can significantly improve not only a system trained with additional monolingual data, but even the baseline system trained exclusively on the provided small parallel dataset. Our system ranks first in both directions according to human evaluation and BLEU, with a gain of over 8 BLEU points above the second best system.

* The 6th Workshop on Asian Translation

Via

Access Paper or Ask Questions

Trans-gram, Fast Cross-lingual Word-embeddings

Jan 11, 2016

Jocelyn Coulmance, Jean-Marc Marty, Guillaume Wenzek, Amine Benhalloum

Figure 1 for Trans-gram, Fast Cross-lingual Word-embeddings

Figure 2 for Trans-gram, Fast Cross-lingual Word-embeddings

Figure 3 for Trans-gram, Fast Cross-lingual Word-embeddings

Figure 4 for Trans-gram, Fast Cross-lingual Word-embeddings

Abstract:We introduce Trans-gram, a simple and computationally-efficient method to simultaneously learn and align wordembeddings for a variety of languages, using only monolingual data and a smaller set of sentence-aligned data. We use our new method to compute aligned wordembeddings for twenty-one languages using English as a pivot language. We show that some linguistic features are aligned across languages for which we do not have aligned data, even though those properties do not exist in the pivot language. We also achieve state of the art results on standard cross-lingual text classification and word translation tasks.

* EMNLP 2015

Via

Access Paper or Ask Questions