Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dongdong Zhang

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

May 24, 2023

Hongyuan Lu, Haoyang Huang, Dongdong Zhang, Haoran Yang, Wai Lam, Furu Wei

Figure 1 for Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Figure 2 for Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Figure 3 for Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Figure 4 for Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Abstract:Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) even when trained without parallel data. Yet, despite the fact that the amount of training data is gigantic, they still struggle with translating rare words, particularly for low-resource languages. Even worse, it is usually unrealistic to retrieve relevant demonstrations for in-context learning with low-resource languages on LLMs, which restricts the practical use of LLMs for translation -- how should we mitigate this problem? To this end, we present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities for LLMs. Extensive experiments indicate that augmenting ChatGPT with CoD elicits large gains by up to 13x chrF++ points for MNMT (3.08 to 42.63 for English to Serbian written in Cyrillic script) on FLORES-200 full devtest set. We further demonstrate the importance of chaining the multilingual dictionaries, as well as the superiority of CoD to few-shot demonstration for low-resource languages.

Via

Access Paper or Ask Questions

One-stop Training of Multiple Capacity Models

May 24, 2023

Lan Jiang, Haoyang Huang, Dongdong Zhang, Rui Jiang, Furu Wei

Figure 1 for One-stop Training of Multiple Capacity Models

Figure 2 for One-stop Training of Multiple Capacity Models

Figure 3 for One-stop Training of Multiple Capacity Models

Figure 4 for One-stop Training of Multiple Capacity Models

Abstract:Training models with varying capacities can be advantageous for deploying them in different scenarios. While high-capacity models offer better performance, low-capacity models require fewer computing resources for training and inference. In this work, we propose a novel one-stop training framework to jointly train high-capacity and low-capactiy models. This framework consists of two composite model architectures and a joint training algorithm called Two-Stage Joint-Training (TSJT). Unlike knowledge distillation, where multiple capacity models are trained from scratch separately, our approach integrates supervisions from different capacity models simultaneously, leading to faster and more efficient convergence. Extensive experiments on the multilingual machine translation benchmark WMT10 show that our method outperforms low-capacity baseline models and achieves comparable or better performance on high-capacity models. Notably, the analysis demonstrates that our method significantly influences the initial training process, leading to more efficient convergence and superior solutions.

Via

Access Paper or Ask Questions

Not All Metrics Are Guilty: Improving NLG Evaluation with LLM Paraphrasing

May 24, 2023

Tianyi Tang, Hongyuan Lu, Yuchen Eleanor Jiang, Haoyang Huang, Dongdong Zhang, Wayne Xin Zhao, Furu Wei

Figure 1 for Not All Metrics Are Guilty: Improving NLG Evaluation with LLM Paraphrasing

Figure 2 for Not All Metrics Are Guilty: Improving NLG Evaluation with LLM Paraphrasing

Figure 3 for Not All Metrics Are Guilty: Improving NLG Evaluation with LLM Paraphrasing

Figure 4 for Not All Metrics Are Guilty: Improving NLG Evaluation with LLM Paraphrasing

Abstract:Most research about natural language generation (NLG) relies on evaluation benchmarks with limited references for a sample, which may result in poor correlations with human judgements. The underlying reason is that one semantic meaning can actually be expressed in different forms, and the evaluation with a single or few references may not accurately reflect the quality of the model's hypotheses. To address this issue, this paper presents a novel method, named Para-Ref, to enhance existing evaluation benchmarks by enriching the number of references. We leverage large language models (LLMs) to paraphrase a single reference into multiple high-quality ones in diverse expressions. Experimental results on representative NLG tasks of machine translation, text summarization, and image caption demonstrate that our method can effectively improve the correlation with human evaluation for sixteen automatic evaluation metrics by +7.82% in ratio. We release the code and data at https://github.com/RUCAIBox/Para-Ref.

Via

Access Paper or Ask Questions

Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

May 23, 2023

Minwoo Lee, Hyukhun Koh, Kang-il Lee, Dongdong Zhang, Minsung Kim, Kyomin Jung

Figure 1 for Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

Figure 2 for Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

Figure 3 for Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

Figure 4 for Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

Abstract:Gender bias is a significant issue in machine translation, leading to ongoing research efforts in developing bias mitigation techniques. However, most works focus on debiasing of bilingual models without consideration for multilingual systems. In this paper, we specifically target the unambiguous gender bias issue of multilingual machine translation models and propose a new mitigation method based on a novel perspective on the problem. We hypothesize that the gender bias in unambiguous settings is due to the lack of gender information encoded into the non-explicit gender words and devise a scheme to encode correct gender information into their latent embeddings. Specifically, we employ Gender-Aware Contrastive Learning, GACL, based on gender pseudo-labels to encode gender information on the encoder embeddings. Our method is target-language-agnostic and applicable to already trained multilingual machine translation models through post-fine-tuning. Through multilingual evaluation, we show that our approach improves gender accuracy by a wide margin without hampering translation performance. We also observe that incorporated gender information transfers and benefits other target languages regarding gender accuracy. Finally, we demonstrate that our method is applicable and beneficial to models of various sizes.

Via

Access Paper or Ask Questions

Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

May 18, 2023

Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell

Figure 1 for Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

Figure 2 for Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

Figure 3 for Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

Figure 4 for Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

Abstract:Several recent papers claim human parity at sentence-level Machine Translation (MT), especially in high-resource languages. Thus, in response, the MT community has, in part, shifted its focus to document-level translation. Translating documents requires a deeper understanding of the structure and meaning of text, which is often captured by various kinds of discourse phenomena such as consistency, coherence, and cohesion. However, this renders conventional sentence-level MT evaluation benchmarks inadequate for evaluating the performance of context-aware MT systems. This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al. (2022). The new BWB annotation introduces four extra evaluation aspects, i.e., entity, terminology, coreference, and quotation, covering 15,095 entity mentions in both languages. Using these annotations, we systematically investigate the similarities and differences between the discourse structures of source and target languages, and the challenges they pose to MT. We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures. This gives us a new perspective on the challenges and opportunities in document-level MT. We make our resource publicly available to spur future research in document-level MT and the generalization to other language translation tasks.

* ACL 2023
* 9 pages. arXiv admin note: substantial text overlap with arXiv:2210.14667

Via

Access Paper or Ask Questions

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

May 18, 2023

Liang Chen, Shuming Ma, Dongdong Zhang, Furu Wei, Baobao Chang

Figure 1 for On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

Figure 2 for On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

Figure 3 for On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

Figure 4 for On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

Abstract:While multilingual neural machine translation has achieved great success, it suffers from the off-target issue, where the translation is in the wrong language. This problem is more pronounced on zero-shot translation tasks. In this work, we find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance (i.e., KL-divergence) between two languages' vocabularies is related with a higher off-target rate. We also find that solely isolating the vocab of different languages in the decoder can alleviate the problem. Motivated by the findings, we propose Language Aware Vocabulary Sharing (LAVS), a simple and effective algorithm to construct the multilingual vocabulary, that greatly alleviates the off-target problem of the translation model by increasing the KL-divergence between languages. We conduct experiments on a multilingual machine translation benchmark in 11 languages. Experiments show that the off-target rate for 90 translation tasks is reduced from 29\% to 8\%, while the overall BLEU score is improved by an average of 1.9 points without extra training cost or sacrificing the supervised directions' performance. We release the code at \href{https://github.com/chenllliang/Off-Target-MNMT}{https://github.com/chenllliang/Off-Target-MNMT} for reproduction.

* Findings of ACL 2023

Via

Access Paper or Ask Questions

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

May 11, 2023

Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, Yan Xia, Furu Wei

Figure 1 for Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Figure 2 for Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Figure 3 for Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Figure 4 for Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Abstract:Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages. In this work, we introduce a simple yet effective method, called cross-lingual-thought prompting (XLT), to systematically improve the multilingual capability of LLMs. Specifically, XLT is a generic template prompt that stimulates cross-lingual and logical reasoning skills to enhance task performance across languages. We conduct comprehensive evaluations on 7 typical benchmarks related to reasoning, understanding, and generation tasks, covering both high-resource and low-resource languages. Experimental results show that XLT not only remarkably enhances the performance of various multilingual tasks but also significantly reduces the gap between the average performance and the best performance of each task in different languages. Notably, XLT brings over 10 points of average improvement in arithmetic reasoning and open-domain question-answering tasks.

Via

Access Paper or Ask Questions

On the Pareto Front of Multilingual Neural Machine Translation

Apr 07, 2023

Liang Chen, Shuming Ma, Dongdong Zhang, Furu Wei, Baobao Chang

Figure 1 for On the Pareto Front of Multilingual Neural Machine Translation

Figure 2 for On the Pareto Front of Multilingual Neural Machine Translation

Figure 3 for On the Pareto Front of Multilingual Neural Machine Translation

Figure 4 for On the Pareto Front of Multilingual Neural Machine Translation

Abstract:In this work, we study how the generalization performance of a given direction changes with its sampling ratio in Multilingual Neural Machine Translation (MNMT). By training over 200 multilingual models with various model sizes, directions, and total numbers of tasks, we find that scalarization leads to a multitask trade-off front that deviates from the traditional Pareto front when there exists data imbalance in the training corpus. That is, the performance of certain translation directions does not improve with the increase of its weight in the multi-task optimization objective, which poses a great challenge to improve the overall performance of all directions. Based on our observations, we propose the Double Power Law to predict the unique performance trade-off front in MNMT, which is robust across various languages, data adequacy, and the number of tasks. Finally, we formulate the sample ratio selection problem in MNMT as an optimization problem based on the Double Power Law, which achieves better performance than temperature searching and gradient manipulation methods using up to half of the total training budget in our experiments.

* 14 pages, 6 figures, code released at https://github.com/chenllliang/ParetoMNMT

Via

Access Paper or Ask Questions

Are More Layers Beneficial to Graph Transformers?

Mar 01, 2023

Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei

Abstract:Despite that going deep has proven successful in many neural architectures, the existing graph transformers are relatively shallow. In this work, we explore whether more layers are beneficial to graph transformers, and find that current graph transformers suffer from the bottleneck of improving performance by increasing depth. Our further analysis reveals the reason is that deep graph transformers are limited by the vanishing capacity of global attention, restricting the graph transformer from focusing on the critical substructure and obtaining expressive features. To this end, we propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation, and applies local attention on related nodes to obtain substructure based attention encoding. Our model enhances the ability of the global attention to focus on substructures and promotes the expressiveness of the representations, addressing the limitation of self-attention as the graph transformer deepens. Experiments show that our method unblocks the depth limitation of graph transformers and results in state-of-the-art performance across various graph benchmarks with deeper models.

* ICLR 2023

Via

Access Paper or Ask Questions

HanoiT: Enhancing Context-aware Translation via Selective Context

Jan 17, 2023

Jian Yang, Yuwei Yin, Shuming Ma, Liqun Yang, Hongcheng Guo, Haoyang Huang, Dongdong Zhang, Yutao Zeng, Zhoujun Li, Furu Wei

Abstract:Context-aware neural machine translation aims to use the document-level context to improve translation quality. However, not all words in the context are helpful. The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context. To mitigate this problem, we propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context. To verify the effectiveness of our method, extensive experiments and extra quantitative analysis are conducted on four document-level machine translation benchmarks. The experimental results demonstrate that our model significantly outperforms previous models on all datasets via the soft selection mechanism.

Via

Access Paper or Ask Questions