Alert button
Picture for Zhuoyuan Mao

Zhuoyuan Mao

Alert button

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

May 17, 2023
Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi

Figure 1 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Figure 2 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Figure 3 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Figure 4 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

The language-independency of encoded representations within multilingual neural machine translation (MNMT) models is crucial for their generalization ability on zero-shot translation. Neural interlingua representations have been shown as an effective method for achieving this. However, fixed-length neural interlingua representations introduced in previous work can limit its flexibility and representation ability. In this study, we introduce a novel method to enhance neural interlingua representations by making their length variable, thereby overcoming the constraint of fixed-length neural interlingua representations. Our empirical results on zero-shot translation on OPUS, IWSLT, and Europarl datasets demonstrate stable model convergence and superior zero-shot translation results compared to fixed-length neural interlingua representations. However, our analysis reveals the suboptimal efficacy of our approach in translating from certain source languages, wherein we pinpoint the defective model component in our proposed method.

* Accepted to Multi3Generation workshop (held in conjunction with EAMT 2023) 
Viaarxiv icon

Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

May 16, 2023
Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, Sadao Kurohashi

Figure 1 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Figure 2 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Figure 3 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Figure 4 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation (ZST). Recent efforts for ZST often utilize the Transformer architecture as the backbone, with LayerNorm at the input of layers (PreNorm) set as the default. However, Xu et al. (2019) has revealed that PreNorm carries the risk of overfitting the training data. Based on this, we hypothesize that PreNorm may overfit supervised directions and thus have low generalizability for ZST. Through experiments on OPUS, IWSLT, and Europarl datasets for 54 ZST directions, we demonstrate that the original Transformer setting of LayerNorm after residual connections (PostNorm) consistently outperforms PreNorm by up to 12.3 BLEU points. We then study the performance disparities by analyzing the differences in off-target rates and structural variations between PreNorm and PostNorm. This study highlights the need for careful consideration of the LayerNorm setting for ZST.

* Accepted to ACL 2023 main conference 
Viaarxiv icon

GPT-RE: In-context Learning for Relation Extraction using Large Language Models

May 03, 2023
Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi

Figure 1 for GPT-RE: In-context Learning for Relation Extraction using Large Language Models
Figure 2 for GPT-RE: In-context Learning for Relation Extraction using Large Language Models
Figure 3 for GPT-RE: In-context Learning for Relation Extraction using Large Language Models
Figure 4 for GPT-RE: In-context Learning for Relation Extraction using Large Language Models

In spite of the potential for ground-breaking achievements offered by large language models (LLMs) (e.g., GPT-3), they still lag significantly behind fully-supervised baselines (e.g., fine-tuned BERT) in relation extraction (RE). This is due to the two major shortcomings of LLMs in RE: (1) low relevance regarding entity and relation in retrieved demonstrations for in-context learning; and (2) the strong inclination to wrongly classify NULL examples into other pre-defined labels. In this paper, we propose GPT-RE to bridge the gap between LLMs and fully-supervised baselines. GPT-RE successfully addresses the aforementioned issues by (1) incorporating task-specific entity representations in demonstration retrieval; and (2) enriching the demonstrations with gold label-induced reasoning logic. We evaluate GPT-RE on four widely-used RE datasets, and observe that GPT-RE achieves improvements over not only existing GPT-3 baselines, but also fully-supervised baselines. Specifically, GPT-RE achieves SOTA performances on the Semeval and SciERC datasets, and competitive performances on the TACRED and ACE05 datasets.

Viaarxiv icon

LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation

Feb 16, 2023
Zhuoyuan Mao, Tetsuji Nakagawa

Figure 1 for LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation
Figure 2 for LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation
Figure 3 for LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation
Figure 4 for LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation

Large-scale language-agnostic sentence embedding models such as LaBSE (Feng et al., 2022) obtain state-of-the-art performance for parallel sentence alignment. However, these large-scale models can suffer from inference speed and computation overhead. This study systematically explores learning language-agnostic sentence embeddings with lightweight models. We demonstrate that a thin-deep encoder can construct robust low-dimensional sentence embeddings for 109 languages. With our proposed distillation methods, we achieve further improvements by incorporating knowledge from a teacher model. Empirical results on Tatoeba, United Nations, and BUCC show the effectiveness of our lightweight models. We release our lightweight language-agnostic sentence embedding models LEALLA on TensorFlow Hub.

* EACL 2023 main conference; LEALLA models: https://tfhub.dev/google/collections/LEALLA 
Viaarxiv icon

Textual Enhanced Contrastive Learning for Solving Math Word Problems

Nov 29, 2022
Yibin Shen, Qianying Liu, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi

Figure 1 for Textual Enhanced Contrastive Learning for Solving Math Word Problems
Figure 2 for Textual Enhanced Contrastive Learning for Solving Math Word Problems
Figure 3 for Textual Enhanced Contrastive Learning for Solving Math Word Problems
Figure 4 for Textual Enhanced Contrastive Learning for Solving Math Word Problems

Solving math word problems is the task that analyses the relation of quantities and requires an accurate understanding of contextual natural language information. Recent studies show that current models rely on shallow heuristics to predict solutions and could be easily misled by small textual perturbations. To address this problem, we propose a Textual Enhanced Contrastive Learning framework, which enforces the models to distinguish semantically similar examples while holding different mathematical logic. We adopt a self-supervised manner strategy to enrich examples with subtle textual variance by textual reordering or problem re-construction. We then retrieve the hardest to differentiate samples from both equation and textual perspectives and guide the model to learn their representations. Experimental results show that our method achieves state-of-the-art on both widely used benchmark datasets and also exquisitely designed challenge datasets in English and Chinese. \footnote{Our code and data is available at \url{https://github.com/yiyunya/Textual_CL_MWP}

* Findings of EMNLP 2022 
Viaarxiv icon

Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems

Sep 21, 2022
Yibin Shen, Qianying Liu, Zhuoyuan Mao, Zhen Wan, Fei Cheng, Sadao Kurohashi

Figure 1 for Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems
Figure 2 for Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems
Figure 3 for Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems
Figure 4 for Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems

To solve Math Word Problems, human students leverage diverse reasoning logic that reaches different possible equation solutions. However, the mainstream sequence-to-sequence approach of automatic solvers aims to decode a fixed solution equation supervised by human annotation. In this paper, we propose a controlled equation generation solver by leveraging a set of control codes to guide the model to consider certain reasoning logic and decode the corresponding equations expressions transformed from the human reference. The empirical results suggest that our method universally improves the performance on single-unknown (Math23K) and multiple-unknown (DRAW1K, HMWP) benchmarks, with substantial improvements up to 13.2% accuracy on the challenging multiple-unknown datasets.

* AACL 2022 short paper 
Viaarxiv icon

EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning

May 31, 2022
Zhuoyuan Mao, Chenhui Chu, Sadao Kurohashi

Figure 1 for EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning
Figure 2 for EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning
Figure 3 for EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning
Figure 4 for EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning

Massively multilingual sentence representation models, e.g., LASER, SBERT-distill, and LaBSE, help significantly improve cross-lingual downstream tasks. However, multiple training procedures, the use of a large amount of data, or inefficient model architectures result in heavy computation to train a new model according to our preferred languages and domains. To resolve this issue, we introduce efficient and effective massively multilingual sentence representation learning (EMS), using cross-lingual sentence reconstruction (XTR) and sentence-level contrastive learning as training objectives. Compared with related studies, the proposed model can be efficiently trained using significantly fewer parallel sentences and GPU computation resources without depending on large-scale pre-trained models. Empirical results show that the proposed model significantly yields better or comparable results with regard to bi-text mining, zero-shot cross-lingual genre classification, and sentiment classification. Ablative analyses demonstrate the effectiveness of each component of the proposed model. We release the codes for model training and the EMS pre-trained model, which supports 62 languages (https://github.com/Mao-KU/EMS).

* This work is an extension of arXiv:2105.13856. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 
Viaarxiv icon

Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision

May 18, 2022
Zhen Wan, Fei Cheng, Qianying Liu, Zhuoyuan Mao, Haiyue Song, Sadao Kurohashi

Figure 1 for Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision
Figure 2 for Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision
Figure 3 for Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision
Figure 4 for Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision

Contrastive pre-training on distant supervision has shown remarkable effectiveness for improving supervised relation extraction tasks. However, the existing methods ignore the intrinsic noise of distant supervision during the pre-training stage. In this paper, we propose a weighted contrastive learning method by leveraging the supervised data to estimate the reliability of pre-training instances and explicitly reduce the effect of noise. Experimental results on three supervised datasets demonstrate the advantages of our proposed weighted contrastive learning approach, compared to two state-of-the-art non-weighted baselines.

* Under review 
Viaarxiv icon

When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?

Apr 26, 2022
Zhuoyuan Mao, Chenhui Chu, Raj Dabre, Haiyue Song, Zhen Wan, Sadao Kurohashi

Figure 1 for When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?
Figure 2 for When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?
Figure 3 for When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?
Figure 4 for When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?

Word alignment has proven to benefit many-to-many neural machine translation (NMT). However, high-quality ground-truth bilingual dictionaries were used for pre-editing in previous methods, which are unavailable for most language pairs. Meanwhile, the contrastive objective can implicitly utilize automatically learned word alignment, which has not been explored in many-to-many NMT. This work proposes a word-level contrastive objective to leverage word alignments for many-to-many NMT. Empirical results show that this leads to 0.8 BLEU gains for several language pairs. Analyses reveal that in many-to-many NMT, the encoder's sentence retrieval performance highly correlates with the translation quality, which explains when the proposed method impacts translation. This motivates future exploration for many-to-many NMT to improve the encoder's sentence retrieval performance.

* NAACL 2022 findings 
Viaarxiv icon

Linguistically-driven Multi-task Pre-training for Low-resource Neural Machine Translation

Jan 20, 2022
Zhuoyuan Mao, Chenhui Chu, Sadao Kurohashi

Figure 1 for Linguistically-driven Multi-task Pre-training for Low-resource Neural Machine Translation
Figure 2 for Linguistically-driven Multi-task Pre-training for Low-resource Neural Machine Translation
Figure 3 for Linguistically-driven Multi-task Pre-training for Low-resource Neural Machine Translation
Figure 4 for Linguistically-driven Multi-task Pre-training for Low-resource Neural Machine Translation

In the present study, we propose novel sequence-to-sequence pre-training objectives for low-resource machine translation (NMT): Japanese-specific sequence to sequence (JASS) for language pairs involving Japanese as the source or target language, and English-specific sequence to sequence (ENSS) for language pairs involving English. JASS focuses on masking and reordering Japanese linguistic units known as bunsetsu, whereas ENSS is proposed based on phrase structure masking and reordering tasks. Experiments on ASPEC Japanese--English & Japanese--Chinese, Wikipedia Japanese--Chinese, News English--Korean corpora demonstrate that JASS and ENSS outperform MASS and other existing language-agnostic pre-training methods by up to +2.9 BLEU points for the Japanese--English tasks, up to +7.0 BLEU points for the Japanese--Chinese tasks and up to +1.3 BLEU points for English--Korean tasks. Empirical analysis, which focuses on the relationship between individual parts in JASS and ENSS, reveals the complementary nature of the subtasks of JASS and ENSS. Adequacy evaluation using LASER, human evaluation, and case studies reveals that our proposed methods significantly outperform pre-training methods without injected linguistic knowledge and they have a larger positive impact on the adequacy as compared to the fluency. We release codes here: https://github.com/Mao-KU/JASS/tree/master/linguistically-driven-pretraining.

* TALLIP Volume 21, Issue 4, July 2022  
* An extension of work arXiv:2005.03361 
Viaarxiv icon