Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages

Jul 19, 2021

Dana Ruiter, Dietrich Klakow, Josef van Genabith, Cristina España-Bonet

Figure 1 for Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages

Figure 2 for Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages

Figure 3 for Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages

Figure 4 for Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages

Share this with someone who'll enjoy it:

Abstract:For most language combinations, parallel data is either scarce or simply unavailable. To address this, unsupervised machine translation (UMT) exploits large amounts of monolingual data by using synthetic data generation techniques such as back-translation and noising, while self-supervised NMT (SSNMT) identifies parallel sentences in smaller comparable data and trains on them. To date, the inclusion of UMT data generation techniques in SSNMT has not been investigated. We show that including UMT techniques into SSNMT significantly outperforms SSNMT and UMT on all tested language pairs, with improvements of up to +4.3 BLEU, +50.8 BLEU, +51.5 over SSNMT, statistical UMT and hybrid UMT, respectively, on Afrikaans to English. We further show that the combination of multilingual denoising autoencoding, SSNMT with backtranslation and bilingual finetuning enables us to learn machine translation even for distant language pairs for which only small amounts of monolingual data are available, e.g. yielding BLEU scores of 11.6 (English to Swahili).

* 11 pages, 8 figures, accepted at MT-Summit 2021 (Research Track)

View paper on

Share this with someone who'll enjoy it:

Title:Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages

Paper and Code