Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhaopeng Tu

A Template-based Method for Constrained Neural Machine Translation

May 23, 2022
Shuo Wang, Peng Li, Zhixing Tan, Zhaopeng Tu, Maosong Sun, Yang Liu

Figure 1 for A Template-based Method for Constrained Neural Machine Translation

Figure 2 for A Template-based Method for Constrained Neural Machine Translation

Figure 3 for A Template-based Method for Constrained Neural Machine Translation

Figure 4 for A Template-based Method for Constrained Neural Machine Translation

Machine translation systems are expected to cope with various types of constraints in many practical scenarios. While neural machine translation (NMT) has achieved strong performance in unconstrained cases, it is non-trivial to impose pre-specified constraints into the translation process of NMT models. Although many approaches have been proposed to address this issue, most existing methods can not satisfy the following three desiderata at the same time: (1) high translation quality, (2) high match accuracy, and (3) low latency. In this work, we propose a template-based method that can yield results with high translation quality and match accuracy while keeping the decoding speed. Our basic idea is to rearrange the generation of constrained and unconstrained tokens through a template. The generation and derivation of the template can be learned through one sequence-to-sequence training framework. Thus our method does not require any changes in the model architecture and the decoding algorithm, making it easy to apply. Experimental results show that the proposed template-based methods can outperform several representative baselines in lexically and structurally constrained translation tasks.

* 14 pages, 4 figures

Via

Access Paper or Ask Questions

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

May 20, 2022
Wenxuan Wang, Wenxiang Jiao, Shuo Wang, Zhaopeng Tu, Michael R. Lyu

Figure 1 for Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Figure 2 for Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Figure 3 for Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Figure 4 for Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Zero-shot translation is a promising direction for building a comprehensive multilingual neural machine translation (MNMT) system. However, its quality is still not satisfactory due to off-target issues. In this paper, we aim to understand and alleviate the off-target issues from the perspective of uncertainty in zero-shot translation. By carefully examining the translation output and model confidence, we identify two uncertainties that are responsible for the off-target issues, namely, extrinsic data uncertainty and intrinsic model uncertainty. Based on the observations, we propose two light-weight and complementary approaches to denoise the training data for model training, and mask out the vocabulary of the off-target languages in inference. Extensive experiments on both balanced and unbalanced datasets show that our approaches significantly improve the performance of zero-shot translation over strong MNMT baselines. Qualitative analyses provide insights into where our approaches reduce off-target translations

* work in progress

Via

Access Paper or Ask Questions

Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Mar 23, 2022
Zhiwei He, Xing Wang, Rui Wang, Shuming Shi, Zhaopeng Tu

Figure 1 for Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Figure 2 for Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Figure 3 for Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Figure 4 for Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Back-translation is a critical component of Unsupervised Neural Machine Translation (UNMT), which generates pseudo parallel data from target monolingual data. A UNMT model is trained on the pseudo parallel data with translated source, and translates natural source sentences in inference. The source discrepancy between training and inference hinders the translation performance of UNMT models. By carefully designing experiments, we identify two representative characteristics of the data gap in source: (1) style gap (i.e., translated vs. natural text style) that leads to poor generalization capability; (2) content gap that induces the model to produce hallucination content biased towards the target language. To narrow the data gap, we propose an online self-training approach, which simultaneously uses the pseudo parallel data {natural source, translated target} to mimic the inference scenario. Experimental results on several widely-used language pairs show that our approach outperforms two strong baselines (XLM and MASS) by remedying the style and content gaps.

* 13 pages, ACL 2022

Via

Access Paper or Ask Questions

Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

Mar 16, 2022
Wenxuan Wang, Wenxiang Jiao, Yongchang Hao, Xing Wang, Shuming Shi, Zhaopeng Tu, Michael Lyu

Figure 1 for Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

Figure 2 for Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

Figure 3 for Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

Figure 4 for Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation~(NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining.

* Accepted by ACL 2022 main conference

Via

Access Paper or Ask Questions

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

Oct 05, 2021
Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

Figure 1 for On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

Figure 2 for On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

Figure 3 for On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

Figure 4 for On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT). This paper takes the first step to investigate the complementarity between PT and BT. We introduce two probing tasks for PT and BT respectively and find that PT mainly contributes to the encoder module while BT brings more benefits to the decoder. Experimental results show that PT and BT are nicely complementary to each other, establishing state-of-the-art performances on the WMT16 English-Romanian and English-Russian benchmarks. Through extensive analyses on sentence originality and word frequency, we also demonstrate that combining Tagged BT with PT is more helpful to their complementarity, leading to better translation quality. Source code is freely available at https://github.com/SunbowLiu/PTvsBT.

* Accepted to Findings of EMNLP 2021

Via

Access Paper or Ask Questions

On the Copying Behaviors of Pre-Training for Neural Machine Translation

Jul 17, 2021
Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

Figure 1 for On the Copying Behaviors of Pre-Training for Neural Machine Translation

Figure 2 for On the Copying Behaviors of Pre-Training for Neural Machine Translation

Figure 3 for On the Copying Behaviors of Pre-Training for Neural Machine Translation

Figure 4 for On the Copying Behaviors of Pre-Training for Neural Machine Translation

Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at https://github.com/SunbowLiu/CopyingPenalty.

* Accepted to Findings of ACL 2021

Via

Access Paper or Ask Questions

Language Models are Good Translators

Jun 25, 2021
Shuo Wang, Zhaopeng Tu, Zhixing Tan, Wenxuan Wang, Maosong Sun, Yang Liu

Figure 1 for Language Models are Good Translators

Figure 2 for Language Models are Good Translators

Figure 3 for Language Models are Good Translators

Figure 4 for Language Models are Good Translators

Recent years have witnessed the rapid advance in neural machine translation (NMT), the core of which lies in the encoder-decoder architecture. Inspired by the recent progress of large-scale pre-trained language models on machine translation in a limited scenario, we firstly demonstrate that a single language model (LM4MT) can achieve comparable performance with strong encoder-decoder NMT models on standard machine translation benchmarks, using the same training data and similar amount of model parameters. LM4MT can also easily utilize source-side texts as additional supervision. Though modeling the source- and target-language texts with the same mechanism, LM4MT can provide unified representations for both source and target sentences, which can better transfer knowledge across languages. Extensive experiments on pivot-based and zero-shot translation tasks show that LM4MT can outperform the encoder-decoder NMT model by a large margin.

* 12 pages. Work in progress. An earlier verison of this manuscript is under review

Via

Access Paper or Ask Questions

Progressive Multi-Granularity Training for Non-Autoregressive Translation

Jun 11, 2021
Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao, Zhaopeng Tu

Figure 1 for Progressive Multi-Granularity Training for Non-Autoregressive Translation

Figure 2 for Progressive Multi-Granularity Training for Non-Autoregressive Translation

Figure 3 for Progressive Multi-Granularity Training for Non-Autoregressive Translation

Figure 4 for Progressive Multi-Granularity Training for Non-Autoregressive Translation

Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence. However, recent studies show that NAT is weak at learning high-mode of knowledge such as one-to-many translations. We argue that modes can be divided into various granularities which can be learned from easy to hard. In this study, we empirically show that NAT models are prone to learn fine-grained lower-mode knowledge, such as words and phrases, compared with sentences. Based on this observation, we propose progressive multi-granularity training for NAT. More specifically, to make the most of the training data, we break down the sentence-level examples into three types, i.e. words, phrases, sentences, and with the training goes, we progressively increase the granularities. Experiments on Romanian-English, English-German, Chinese-English, and Japanese-English demonstrate that our approach improves the phrase translation accuracy and model reordering ability, therefore resulting in better translation quality against strong NAT baselines. Also, we show that more deterministic fine-grained knowledge can further enhance performance.

* ACL 2021, Short Findings

Via

Access Paper or Ask Questions

Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

Jun 09, 2021
Cunxiao Du, Zhaopeng Tu, Jing Jiang

Figure 1 for Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

Figure 2 for Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

Figure 3 for Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

Figure 4 for Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

We propose a new training objective named order-agnostic cross entropy (OaXE) for fully non-autoregressive translation (NAT) models. OaXE improves the standard cross-entropy loss to ameliorate the effect of word reordering, which is a common source of the critical multimodality problem in NAT. Concretely, OaXE removes the penalty for word order errors, and computes the cross entropy loss based on the best possible alignment between model predictions and target tokens. Since the log loss is very sensitive to invalid references, we leverage cross entropy initialization and loss truncation to ensure the model focuses on a good part of the search space. Extensive experiments on major WMT benchmarks show that OaXE substantially improves translation performance, setting new state of the art for fully NAT models. Further analyses show that OaXE alleviates the multimodality problem by reducing token repetitions and increasing prediction confidence. Our code, data, and trained models are available at https://github.com/tencent-ailab/ICML21_OAXE.

* ICML 2021 (Oral), Code at https://github.com/tencent-ailab/ICML21_OAXE

Via

Access Paper or Ask Questions