Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zewen Chi

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Dec 31, 2020

Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal(+3 more)

Figure 1 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Figure 2 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Figure 3 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Figure 4 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Abstract:Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fine-tunes it with multilingual parallel data. This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs. Surprisingly, the method is also effective even upon the strong baseline with back-translation. Moreover, extensive analysis of XLM-T on unsupervised syntactic parsing, word alignment, and multilingual classification explains its effectiveness for machine translation. The code will be at https://aka.ms/xlm-t.

Via

Access Paper or Ask Questions

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

Jul 15, 2020

Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, Heyan Huang, Ming Zhou

Figure 1 for InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

Figure 2 for InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

Figure 3 for InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

Figure 4 for InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

Abstract:In this work, we formulate cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, the information-theoretic framework inspires us to propose a pre-training task based on contrastive learning. Given a bilingual sentence pair, we regard them as two views of the same meaning, and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at http://aka.ms/infoxlm.

* 11 pages

Via

Access Paper or Ask Questions

Generating Informative Dialogue Responses with Keywords-Guided Networks

Jul 03, 2020

Heng-Da Xu, Xian-Ling Mao, Zewen Chi, Jing-Jing Zhu, Fanshu Sun, Heyan Huang

Figure 1 for Generating Informative Dialogue Responses with Keywords-Guided Networks

Figure 2 for Generating Informative Dialogue Responses with Keywords-Guided Networks

Figure 3 for Generating Informative Dialogue Responses with Keywords-Guided Networks

Figure 4 for Generating Informative Dialogue Responses with Keywords-Guided Networks

Abstract:Recently, open-domain dialogue systems have attracted growing attention. Most of them use the sequence-to-sequence (Seq2Seq) architecture to generate responses. However, traditional Seq2Seq-based open-domain dialogue models tend to generate generic and safe responses, which are less informative, unlike human responses. In this paper, we propose a simple but effective keywords-guided Sequence-to-Sequence model (KW-Seq2Seq) which uses keywords information as guidance to generate open-domain dialogue responses. Specifically, KW-Seq2Seq first uses a keywords decoder to predict some topic keywords, and then generates the final response under the guidance of them. Extensive experiments demonstrate that the KW-Seq2Seq model produces more informative, coherent and fluent responses, yielding substantive gain in both automatic and human evaluation metrics.

Via

Access Paper or Ask Questions

Can Monolingual Pretrained Models Help Cross-Lingual Classification?

Nov 10, 2019

Zewen Chi, Li Dong, Furu Wei, Xian-Ling Mao, Heyan Huang

Figure 1 for Can Monolingual Pretrained Models Help Cross-Lingual Classification?

Figure 2 for Can Monolingual Pretrained Models Help Cross-Lingual Classification?

Figure 3 for Can Monolingual Pretrained Models Help Cross-Lingual Classification?

Figure 4 for Can Monolingual Pretrained Models Help Cross-Lingual Classification?

Abstract:Multilingual pretrained language models (such as multilingual BERT) have achieved impressive results for cross-lingual transfer. However, due to the constant model capacity, multilingual pre-training usually lags behind the monolingual competitors. In this work, we present two approaches to improve zero-shot cross-lingual classification, by transferring the knowledge from monolingual pretrained models to multilingual ones. Experimental results on two cross-lingual classification benchmarks show that our methods outperform vanilla multilingual fine-tuning.

* 5 pages

Via

Access Paper or Ask Questions

Cross-Lingual Natural Language Generation via Pre-Training

Sep 24, 2019

Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, Heyan Huang

Figure 1 for Cross-Lingual Natural Language Generation via Pre-Training

Figure 2 for Cross-Lingual Natural Language Generation via Pre-Training

Figure 3 for Cross-Lingual Natural Language Generation via Pre-Training

Figure 4 for Cross-Lingual Natural Language Generation via Pre-Training

Abstract:In this work we focus on transferring supervision signals of natural language generation (NLG) tasks between multiple languages. We propose to pretrain the encoder and decoder of a sequence-to-sequence model under both monolingual and cross-lingual settings. The pre-training objective encourages the model to represent different languages in the shared space, so that we can conduct zero-shot cross-lingual transfer. After the pre-training procedure, we use monolingual data to fine-tune the pre-trained model on downstream NLG tasks. Then the sequence-to-sequence model trained in a single language can be directly evaluated beyond that language (i.e., accepting multi-lingual input and producing multi-lingual output). Experimental results on question generation and abstractive summarization show that our model outperforms the machine-translation-based pipeline methods for zero-shot cross-lingual generation. Moreover, cross-lingual transfer improves NLG performance of low-resource languages by leveraging rich-resource language data.

* 10 pages

Via

Access Paper or Ask Questions

Complicated Table Structure Recognition

Aug 28, 2019

Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, Xian-Ling Mao

Figure 1 for Complicated Table Structure Recognition

Figure 2 for Complicated Table Structure Recognition

Figure 3 for Complicated Table Structure Recognition

Figure 4 for Complicated Table Structure Recognition

Abstract:The task of table structure recognition aims to recognize the internal structure of a table, which is a key step to make machines understand tables. Currently, there are lots of studies on this task for different file formats such as ASCII text and HTML. It also attracts lots of attention to recognize the table structures in PDF files. However, it is hard for the existing methods to accurately recognize the structure of complicated tables in PDF files. The complicated tables contain spanning cells which occupy at least two columns or rows. To address the issue, we propose a novel graph neural network for recognizing the table structure in PDF files, named GraphTSR. Specifically, it takes table cells as input, and then recognizes the table structures by predicting relations among cells. Moreover, to evaluate the task better, we construct a large-scale table structure recognition dataset from scientific papers, named SciTSR, which contains 15,000 tables from PDF files and their corresponding structure labels. Extensive experiments demonstrate that our proposed model is highly effective for complicated tables and outperforms state-of-the-art baselines over a benchmark dataset and our new constructed dataset.

Via

Access Paper or Ask Questions