Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora

Dec 31, 2020
Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

Share this with someone who'll enjoy it:

Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance on downstream cross-lingual tasks. This improvement stems from the learning of a large amount of monolingual and parallel corpora. While it is generally acknowledged that parallel corpora are critical for improving the model performance, existing methods are often constrained by the size of parallel corpora, especially for the low-resource languages. In this paper, we propose ERNIE-M, a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to break the constraint of parallel corpus size on the model performance. Our key insight is to integrate the idea of back translation in the pre-training process. We generate pseudo-parallel sentences pairs on a monolingual corpus to enable the learning of semantic alignment between different languages, which enhances the semantic modeling of cross-lingual models. Experimental results show that ERNIE-M outperforms existing cross-lingual models and delivers new state-of-the-art results on various cross-lingual downstream tasks. The codes and pre-trained models will be made publicly available.

   Access Paper Source

Share this with someone who'll enjoy it: