Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yang Feng

Rephrasing the Reference for Non-Autoregressive Machine Translation

Nov 30, 2022
Chenze Shao, Jinchao Zhang, Jie Zhou, Yang Feng

Figure 1 for Rephrasing the Reference for Non-Autoregressive Machine Translation

Figure 2 for Rephrasing the Reference for Non-Autoregressive Machine Translation

Figure 3 for Rephrasing the Reference for Non-Autoregressive Machine Translation

Figure 4 for Rephrasing the Reference for Non-Autoregressive Machine Translation

Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence, so the reference sentence may be inappropriate for the training when the NAT output is closer to other translations. In response to this problem, we introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output. As we train NAT based on the rephraser output rather than the reference sentence, the rephraser output should fit well with the NAT output and not deviate too far from the reference, which can be quantified as reward functions and optimized by reinforcement learning. Experiments on major WMT benchmarks and NAT baselines show that our approach consistently improves the translation quality of NAT. Specifically, our best variant achieves comparable performance to the autoregressive Transformer, while being 14.7 times more efficient in inference.

* AAAI 2023

Via

Access Paper or Ask Questions

Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions

Nov 04, 2022
Shuhao Gu, Bojie Hu, Yang Feng

Figure 1 for Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions

Figure 2 for Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions

Figure 3 for Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions

Figure 4 for Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions

This paper considers continual learning of large-scale pretrained neural machine translation model without accessing the previous training data or introducing model separation. We argue that the widely used regularization-based methods, which perform multi-objective learning with an auxiliary loss, suffer from the misestimate problem and cannot always achieve a good balance between the previous and new tasks. To solve the problem, we propose a two-stage training method based on the local features of the real loss. We first search low forgetting risk regions, where the model can retain the performance on the previous task as the parameters are updated, to avoid the catastrophic forgetting problem. Then we can continually train the model within this region only with the new training data to fit the new task. Specifically, we propose two methods to search the low forgetting risk regions, which are based on the curvature of loss and the impacts of the parameters on the model output, respectively. We conduct experiments on domain adaptation and more challenging language adaptation tasks, and the experimental results show that our method can achieve significant improvements compared with several strong baselines.

* EMNLP 2022 Main Conference Long Paper

Via

Access Paper or Ask Questions

Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues

Oct 30, 2022
Jiao Ou, Jinchao Zhang, Yang Feng, Jie Zhou

Figure 1 for Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues

Figure 2 for Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues

Figure 3 for Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues

Figure 4 for Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues

The construction of open-domain dialogue systems requires high-quality dialogue datasets. The dialogue data admits a wide variety of responses for a given dialogue history, especially responses with different semantics. However, collecting high-quality such a dataset in most scenarios is labor-intensive and time-consuming. In this paper, we propose a data augmentation method to automatically augment high-quality responses with different semantics by counterfactual inference. Specifically, given an observed dialogue, our counterfactual generation model first infers semantically different responses by replacing the observed reply perspective with substituted ones. Furthermore, our data selection method filters out detrimental augmented responses. Experimental results show that our data augmentation method can augment high-quality responses with different semantics for a given dialogue history, and can outperform competitive baselines on multiple downstream tasks.

* Accepted at EMNLP 2022 (main conference)

Via

Access Paper or Ask Questions

TFormer: 3D Tooth Segmentation in Mesh Scans with Geometry Guided Transformer

Oct 29, 2022
Huimin Xiong, Kunle Li, Kaiyuan Tan, Yang Feng, Joey Tianyi Zhou, Jin Hao, Zuozhu Liu

Figure 1 for TFormer: 3D Tooth Segmentation in Mesh Scans with Geometry Guided Transformer

Figure 2 for TFormer: 3D Tooth Segmentation in Mesh Scans with Geometry Guided Transformer

Figure 3 for TFormer: 3D Tooth Segmentation in Mesh Scans with Geometry Guided Transformer

Figure 4 for TFormer: 3D Tooth Segmentation in Mesh Scans with Geometry Guided Transformer

Optical Intra-oral Scanners (IOS) are widely used in digital dentistry, providing 3-Dimensional (3D) and high-resolution geometrical information of dental crowns and the gingiva. Accurate 3D tooth segmentation, which aims to precisely delineate the tooth and gingiva instances in IOS, plays a critical role in a variety of dental applications. However, segmentation performance of previous methods are error-prone in complicated tooth-tooth or tooth-gingiva boundaries, and usually exhibit unsatisfactory results across various patients, yet the clinically applicability is not verified with large-scale dataset. In this paper, we propose a novel method based on 3D transformer architectures that is evaluated with large-scale and high-resolution 3D IOS datasets. Our method, termed TFormer, captures both local and global dependencies among different teeth to distinguish various types of teeth with divergent anatomical structures and confusing boundaries. Moreover, we design a geometry guided loss based on a novel point curvature to exploit boundary geometric features, which helps refine the boundary predictions for more accurate and smooth segmentation. We further employ a multi-task learning scheme, where an additional teeth-gingiva segmentation head is introduced to improve the performance. Extensive experimental results in a large-scale dataset with 16,000 IOS, the largest IOS dataset to our best knowledge, demonstrate that our TFormer can surpass existing state-of-the-art baselines with a large margin, with its utility in real-world scenarios verified by a clinical applicability test.

Via

Access Paper or Ask Questions

Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings

Oct 28, 2022
Shuhao Gu, Yang Feng

Figure 1 for Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings

Figure 2 for Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings

Figure 3 for Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings

Figure 4 for Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings

The many-to-many multilingual neural machine translation can translate between language pairs unseen during training, i.e., zero-shot translation. Improving zero-shot translation requires the model to learn universal representations and cross-mapping relationships to transfer the knowledge learned on the supervised directions to the zero-shot directions. In this work, we propose the state mover's distance based on the optimal theory to model the difference of the representations output by the encoder. Then, we bridge the gap between the semantic-equivalent representations of different languages at the token level by minimizing the proposed distance to learn universal representations. Besides, we propose an agreement-based training scheme, which can help the model make consistent predictions based on the semantic-equivalent sentences to learn universal cross-mapping relationships for all translation directions. The experimental results on diverse multilingual datasets show that our method can improve consistently compared with the baseline system and other contrast methods. The analysis proves that our method can better align the semantic space and improve the prediction consistency.

* EMNLP 2022 Long Findings

Via

Access Paper or Ask Questions

Wait-info Policy: Balancing Source and Target at Information Level for Simultaneous Machine Translation

Oct 20, 2022
Shaolei Zhang, Shoutao Guo, Yang Feng

Figure 1 for Wait-info Policy: Balancing Source and Target at Information Level for Simultaneous Machine Translation

Figure 2 for Wait-info Policy: Balancing Source and Target at Information Level for Simultaneous Machine Translation

Figure 3 for Wait-info Policy: Balancing Source and Target at Information Level for Simultaneous Machine Translation

Figure 4 for Wait-info Policy: Balancing Source and Target at Information Level for Simultaneous Machine Translation

Simultaneous machine translation (SiMT) outputs the translation while receiving the source inputs, and hence needs to balance the received source information and translated target information to make a reasonable decision between waiting for inputs or outputting translation. Previous methods always balance source and target information at the token level, either directly waiting for a fixed number of tokens or adjusting the waiting based on the current token. In this paper, we propose a Wait-info Policy to balance source and target at the information level. We first quantify the amount of information contained in each token, named info. Then during simultaneous translation, the decision of waiting or outputting is made based on the comparison results between the total info of previous target outputs and received source inputs. Experiments show that our method outperforms strong baselines under and achieves better balance via the proposed info.

* Accept to EMNLP 2022. 15 pages, 10 Figures, 6 Tables

Via

Access Paper or Ask Questions

Low-resource Neural Machine Translation with Cross-modal Alignment

Oct 13, 2022
Zhe Yang, Qingkai Fang, Yang Feng

Figure 1 for Low-resource Neural Machine Translation with Cross-modal Alignment

Figure 2 for Low-resource Neural Machine Translation with Cross-modal Alignment

Figure 3 for Low-resource Neural Machine Translation with Cross-modal Alignment

Figure 4 for Low-resource Neural Machine Translation with Cross-modal Alignment

How to achieve neural machine translation with limited parallel data? Existing techniques often rely on large-scale monolingual corpora, which is impractical for some low-resource languages. In this paper, we turn to connect several low-resource languages to a particular high-resource one by additional visual modality. Specifically, we propose a cross-modal contrastive learning method to learn a shared space for all languages, where both a coarse-grained sentence-level objective and a fine-grained token-level one are introduced. Experimental results and further analysis show that our method can effectively learn the cross-modal and cross-lingual alignment with a small amount of image-text pairs and achieves significant improvements over the text-only baseline under both zero-shot and few-shot scenarios.

* Accepted to EMNLP 2022

Via

Access Paper or Ask Questions

Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation

Oct 11, 2022
Chenze Shao, Zhengrui Ma, Yang Feng

Figure 1 for Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation

Figure 2 for Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation

Figure 3 for Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation

Figure 4 for Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation

Non-autoregressive models achieve significant decoding speedup in neural machine translation but lack the ability to capture sequential dependency. Directed Acyclic Transformer (DA-Transformer) was recently proposed to model sequential dependency with a directed acyclic graph. Consequently, it has to apply a sequential decision process at inference time, which harms the global translation accuracy. In this paper, we present a Viterbi decoding framework for DA-Transformer, which guarantees to find the joint optimal solution for the translation and decoding path under any length constraint. Experimental results demonstrate that our approach consistently improves the performance of DA-Transformer while maintaining a similar decoding speedup.

* Findings of EMNLP 2022

Via

Access Paper or Ask Questions

Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation

Oct 08, 2022
Chenze Shao, Yang Feng

Figure 1 for Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation

Figure 2 for Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation

Figure 3 for Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation

Figure 4 for Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation

Non-autoregressive translation (NAT) models are typically trained with the cross-entropy loss, which forces the model outputs to be aligned verbatim with the target sentence and will highly penalize small shifts in word positions. Latent alignment models relax the explicit alignment by marginalizing out all monotonic latent alignments with the CTC loss. However, they cannot handle non-monotonic alignments, which is non-negligible as there is typically global word reordering in machine translation. In this work, we explore non-monotonic latent alignments for NAT. We extend the alignment space to non-monotonic alignments to allow for the global word reordering and further consider all alignments that overlap with the target sentence. We non-monotonically match the alignments to the target sentence and train the latent alignment model to maximize the F1 score of non-monotonic matching. Extensive experiments on major WMT benchmarks show that our method substantially improves the translation performance of CTC-based models. Our best model achieves 30.06 BLEU on WMT14 En-De with only one-iteration decoding, closing the gap between non-autoregressive and autoregressive models.

* NeurIPS 2022

Via

Access Paper or Ask Questions

Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Sep 30, 2022
Ye Tian, Haolei Weng, Yang Feng

Figure 1 for Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Figure 2 for Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Figure 3 for Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Figure 4 for Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that not only can effectively utilize unknown similarity between related tasks but is also robust against a fraction of outlier tasks from arbitrary sources. The proposed procedure is shown to achieve minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Finally, we demonstrate the effectiveness of our methods through simulations and a real data analysis. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.

* 149 pages, 7 figures, 2 tables

Via

Access Paper or Ask Questions