Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Jun 09, 2020

Changhan Wang, Juan Pino, Jiatao Gu

Figure 1 for Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Figure 2 for Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Figure 3 for Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Figure 4 for Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Share this with someone who'll enjoy it:

Abstract:Transfer learning from high-resource languages is known to be an efficient way to improve end-to-end automatic speech recognition (ASR) for low-resource languages. Pre-trained or jointly trained encoder-decoder models, however, do not share the language modeling (decoder) for the same language, which is likely to be inefficient for distant target languages. We introduce speech-to-text translation (ST) as an auxiliary task to incorporate additional knowledge of the target language and enable transferring from that target language. Specifically, we first translate high-resource ASR transcripts into a target low-resource language, with which a ST model is trained. Both ST and target ASR share the same attention-based encoder-decoder architecture and vocabulary. The former task then provides a fully pre-trained model for the latter, bringing up to 24.6% word error rate (WER) reduction to the baseline (direct transfer from high-resource ASR). We show that training ST with human translations is not necessary. ST trained with machine translation (MT) pseudo-labels brings consistent gains. It can even outperform those using human labels when transferred to target ASR by leveraging only 500K MT examples. Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.

* Submitted to INTERSPEECH 2020

View paper on

Share this with someone who'll enjoy it:

Title:Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Paper and Code