Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liangyou Li

Universal Conditional Masked Language Pre-training for Neural Machine Translation

Mar 20, 2022

Pengfei Li, Liangyou Li, Meng Zhang, Minghao Wu, Qun Liu

Figure 1 for Universal Conditional Masked Language Pre-training for Neural Machine Translation

Figure 2 for Universal Conditional Masked Language Pre-training for Neural Machine Translation

Figure 3 for Universal Conditional Masked Language Pre-training for Neural Machine Translation

Figure 4 for Universal Conditional Masked Language Pre-training for Neural Machine Translation

Abstract:Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT). Different from prior works where pre-trained models usually adopt an unidirectional decoder, this paper demonstrates that pre-training a sequence-to-sequence model but with a bidirectional decoder can produce notable performance gains for both Autoregressive and Non-autoregressive NMT. Specifically, we propose CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora in many languages. We also introduce two simple but effective methods to enhance the CeMAT, aligned code-switching & masking and dynamic dual-masking. We conduct extensive experiments and show that our CeMAT can achieve significant performance improvement for all scenarios from low- to extremely high-resource languages, i.e., up to +14.4 BLEU on low resource and +7.9 BLEU improvements on average for Autoregressive NMT. For Non-autoregressive NMT, we demonstrate it can also produce consistent performance gains, i.e., up to +5.3 BLEU. To the best of our knowledge, this is the first work to pre-train a unified model for fine-tuning on both NMT tasks. Code, data, and pre-trained models are available at https://github.com/huawei-noah/Pretrained-Language-Model/CeMAT

* Accepted to ACL 2022 Main conference

Via

Access Paper or Ask Questions

Triangular Transfer: Freezing the Pivot for Triangular Machine Translation

Mar 17, 2022

Meng Zhang, Liangyou Li, Qun Liu

Figure 1 for Triangular Transfer: Freezing the Pivot for Triangular Machine Translation

Figure 2 for Triangular Transfer: Freezing the Pivot for Triangular Machine Translation

Figure 3 for Triangular Transfer: Freezing the Pivot for Triangular Machine Translation

Figure 4 for Triangular Transfer: Freezing the Pivot for Triangular Machine Translation

Abstract:Triangular machine translation is a special case of low-resource machine translation where the language pair of interest has limited parallel data, but both languages have abundant parallel data with a pivot language. Naturally, the key to triangular machine translation is the successful exploitation of such auxiliary data. In this work, we propose a transfer-learning-based approach that utilizes all types of auxiliary data. As we train auxiliary source-pivot and pivot-target translation models, we initialize some parameters of the pivot side with a pre-trained language model and freeze them to encourage both translation models to work in the same pivot language space, so that they can be smoothly transferred to the source-target translation model. Experiments show that our approach can outperform previous ones.

* ACL 2022

Via

Access Paper or Ask Questions

Adversarial Parameter Defense by Multi-Step Risk Minimization

Sep 07, 2021

Zhiyuan Zhang, Ruixuan Luo, Xuancheng Ren, Qi Su, Liangyou Li, Xu Sun

Figure 1 for Adversarial Parameter Defense by Multi-Step Risk Minimization

Figure 2 for Adversarial Parameter Defense by Multi-Step Risk Minimization

Figure 3 for Adversarial Parameter Defense by Multi-Step Risk Minimization

Figure 4 for Adversarial Parameter Defense by Multi-Step Risk Minimization

Abstract:Previous studies demonstrate DNNs' vulnerability to adversarial examples and adversarial training can establish a defense to adversarial examples. In addition, recent studies show that deep neural networks also exhibit vulnerability to parameter corruptions. The vulnerability of model parameters is of crucial value to the study of model robustness and generalization. In this work, we introduce the concept of parameter corruption and propose to leverage the loss change indicators for measuring the flatness of the loss basin and the parameter robustness of neural network parameters. On such basis, we analyze parameter corruptions and propose the multi-step adversarial corruption algorithm. To enhance neural networks, we propose the adversarial parameter defense algorithm that minimizes the average risk of multiple adversarial parameter corruptions. Experimental results show that the proposed algorithm can improve both the parameter robustness and accuracy of neural networks.

* Neural Networks 144C (2021) pp. 154-163
* Accepted to Neural Networks. arXiv admin note: text overlap with arXiv:2006.05620

Via

Access Paper or Ask Questions

Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training

Sep 06, 2021

Minghao Wu, Yitong Li, Meng Zhang, Liangyou Li, Gholamreza Haffari, Qun Liu

Figure 1 for Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training

Figure 2 for Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training

Figure 3 for Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training

Figure 4 for Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training

Abstract:Learning multilingual and multi-domain translation model is challenging as the heterogeneous and imbalanced data make the model converge inconsistently over different corpora in real world. One common practice is to adjust the share of each corpus in the training, so that the learning process is balanced and low-resource cases can benefit from the high resource ones. However, automatic balancing methods usually depend on the intra- and inter-dataset characteristics, which is usually agnostic or requires human priors. In this work, we propose an approach, MultiUAT, that dynamically adjusts the training data usage based on the model's uncertainty on a small set of trusted clean data for multi-corpus machine translation. We experiments with two classes of uncertainty measures on multilingual (16 languages with 4 settings) and multi-domain settings (4 for in-domain and 2 for out-of-domain on English-German translation) and demonstrate our approach MultiUAT substantially outperforms its baselines, including both static and dynamic strategies. We analyze the cross-domain transfer and show the deficiency of static and similarity based methods.

* 15 pages, 4 figures, to appear at EMNLP 2021 main conference

Via

Access Paper or Ask Questions

The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation

Aug 09, 2021

Minghan Wang, Yuxia Wang, Chang Su, Jiaxin Guo, Yingtao Zhang, Yujia Liu, Min Zhang, Shimin Tao, Xingshan Zeng, Liangyou Li(+2 more)

Figure 1 for The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation

Figure 2 for The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation

Figure 3 for The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation

Figure 4 for The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation

Abstract:This paper describes our work in participation of the IWSLT-2021 offline speech translation task. Our system was built in a cascade form, including a speaker diarization module, an Automatic Speech Recognition (ASR) module and a Machine Translation (MT) module. We directly use the LIUM SpkDiarization tool as the diarization module. The ASR module is trained with three ASR datasets from different sources, by multi-source training, using a modified Transformer encoder. The MT module is pretrained on the large-scale WMT news translation dataset and fine-tuned on the TED corpus. Our method achieves 24.6 BLEU score on the 2021 test set.

Via

Access Paper or Ask Questions

Multilingual Speech Translation with Unified Transformer: Huawei Noah's Ark Lab at IWSLT 2021

Jun 22, 2021

Xingshan Zeng, Liangyou Li, Qun Liu

Figure 1 for Multilingual Speech Translation with Unified Transformer: Huawei Noah's Ark Lab at IWSLT 2021

Figure 2 for Multilingual Speech Translation with Unified Transformer: Huawei Noah's Ark Lab at IWSLT 2021

Figure 3 for Multilingual Speech Translation with Unified Transformer: Huawei Noah's Ark Lab at IWSLT 2021

Abstract:This paper describes the system submitted to the IWSLT 2021 Multilingual Speech Translation (MultiST) task from Huawei Noah's Ark Lab. We use a unified transformer architecture for our MultiST model, so that the data from different modalities (i.e., speech and text) and different tasks (i.e., Speech Recognition, Machine Translation, and Speech Translation) can be exploited to enhance the model's ability. Specifically, speech and text inputs are firstly fed to different feature extractors to extract acoustic and textual features, respectively. Then, these features are processed by a shared encoder--decoder architecture. We apply several training techniques to improve the performance, including multi-task learning, task-level curriculum learning, data augmentation, etc. Our final system achieves significantly better results than bilingual baselines on supervised language pairs and yields reasonable results on zero-shot language pairs.

* IWSLT 2021

Via

Access Paper or Ask Questions

Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision

Jun 09, 2021

Yinpeng Guo, Liangyou Li, Xin Jiang, Qun Liu

Figure 1 for Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision

Figure 2 for Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision

Figure 3 for Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision

Figure 4 for Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision

Abstract:Recently, pre-training multilingual language models has shown great potential in learning multilingual representation, a crucial topic of natural language processing. Prior works generally use a single mixed attention (MA) module, following TLM (Conneau and Lample, 2019), for attending to intra-lingual and cross-lingual contexts equivalently and simultaneously. In this paper, we propose a network named decomposed attention (DA) as a replacement of MA. The DA consists of an intra-lingual attention (IA) and a cross-lingual attention (CA), which model intralingual and cross-lingual supervisions respectively. In addition, we introduce a language-adaptive re-weighting strategy during training to further boost the model's performance. Experiments on various cross-lingual natural language understanding (NLU) tasks show that the proposed architecture and learning strategy significantly improve the model's cross-lingual transferability.

Via

Access Paper or Ask Questions

RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer

Jun 09, 2021

Xingshan Zeng, Liangyou Li, Qun Liu

Figure 1 for RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer

Figure 2 for RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer

Figure 3 for RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer

Figure 4 for RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer

Abstract:End-to-end simultaneous speech translation (SST), which directly translates speech in one language into text in another language in real-time, is useful in many scenarios but has not been fully investigated. In this work, we propose RealTranS, an end-to-end model for SST. To bridge the modality gap between speech and text, RealTranS gradually downsamples the input speech with interleaved convolution and unidirectional Transformer layers for acoustic modeling, and then maps speech features into text space with a weighted-shrinking operation and a semantic encoder. Besides, to improve the model performance in simultaneous scenarios, we propose a blank penalty to enhance the shrinking quality and a Wait-K-Stride-N strategy to allow local reranking during decoding. Experiments on public and widely-used datasets show that RealTranS with the Wait-K-Stride-N strategy outperforms prior end-to-end models as well as cascaded models in diverse latency settings.

* Accepted by ACL2021 Findings

Via

Access Paper or Ask Questions

An Approach to Improve Robustness of NLP Systems against ASR Errors

Mar 25, 2021

Tong Cui, Jinghui Xiao, Liangyou Li, Xin Jiang, Qun Liu

Figure 1 for An Approach to Improve Robustness of NLP Systems against ASR Errors

Figure 2 for An Approach to Improve Robustness of NLP Systems against ASR Errors

Figure 3 for An Approach to Improve Robustness of NLP Systems against ASR Errors

Figure 4 for An Approach to Improve Robustness of NLP Systems against ASR Errors

Abstract:Speech-enabled systems typically first convert audio to text through an automatic speech recognition (ASR) model and then feed the text to downstream natural language processing (NLP) modules. The errors of the ASR system can seriously downgrade the performance of the NLP modules. Therefore, it is essential to make them robust to the ASR errors. Previous work has shown it is effective to employ data augmentation methods to solve this problem by injecting ASR noise during the training process. In this paper, we utilize the prevalent pre-trained language model to generate training samples with ASR-plausible noise. Compare to the previous methods, our approach generates ASR noise that better fits the real-world error distribution. Experimental results on spoken language translation(SLT) and spoken language understanding (SLU) show that our approach effectively improves the system robustness against the ASR errors and achieves state-of-the-art results on both tasks.

* 9 pages, 3 figures

Via

Access Paper or Ask Questions

Dependency Graph-to-String Statistical Machine Translation

Mar 20, 2021

Liangyou Li, Andy Way, Qun Liu

Figure 1 for Dependency Graph-to-String Statistical Machine Translation

Figure 2 for Dependency Graph-to-String Statistical Machine Translation

Figure 3 for Dependency Graph-to-String Statistical Machine Translation

Figure 4 for Dependency Graph-to-String Statistical Machine Translation

Abstract:We present graph-based translation models which translate source graphs into target strings. Source graphs are constructed from dependency trees with extra links so that non-syntactic phrases are connected. Inspired by phrase-based models, we first introduce a translation model which segments a graph into a sequence of disjoint subgraphs and generates a translation by combining subgraph translations left-to-right using beam search. However, similar to phrase-based models, this model is weak at phrase reordering. Therefore, we further introduce a model based on a synchronous node replacement grammar which learns recursive translation rules. We provide two implementations of the model with different restrictions so that source graphs can be parsed efficiently. Experiments on Chinese--English and German--English show that our graph-based models are significantly better than corresponding sequence- and tree-based baselines.

Via

Access Paper or Ask Questions