Alert button
Picture for Chenhui Chu

Chenhui Chu

Alert button

SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation

Jul 31, 2023
Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi, Eiichiro Sumita

Figure 1 for SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
Figure 2 for SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
Figure 3 for SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
Figure 4 for SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation

Sub-word segmentation is an essential pre-processing step for Neural Machine Translation (NMT). Existing work has shown that neural sub-word segmenters are better than Byte-Pair Encoding (BPE), however, they are inefficient as they require parallel corpora, days to train and hours to decode. This paper introduces SelfSeg, a self-supervised neural sub-word segmentation method that is much faster to train/decode and requires only monolingual dictionaries instead of parallel corpora. SelfSeg takes as input a word in the form of a partially masked character sequence, optimizes the word generation probability and generates the segmentation with the maximum posterior probability, which is calculated using a dynamic programming algorithm. The training time of SelfSeg depends on word frequencies, and we explore several word frequency normalization strategies to accelerate the training phase. Additionally, we propose a regularization mechanism that allows the segmenter to generate various segmentations for one word. To show the effectiveness of our approach, we conduct MT experiments in low-, middle- and high-resource scenarios, where we compare the performance of using different segmentation methods. The experimental results demonstrate that on the low-resource ALT dataset, our method achieves more than 1.2 BLEU score improvement compared with BPE and SentencePiece, and a 1.1 score improvement over Dynamic Programming Encoding (DPE) and Vocabulary Learning via Optimal Transport (VOLT) on average. The regularization method achieves approximately a 4.3 BLEU score improvement over BPE and a 1.2 BLEU score improvement over BPE-dropout, the regularized version of BPE. We also observed significant improvements on IWSLT15 Vi->En, WMT16 Ro->En and WMT15 Fi->En datasets, and competitive results on the WMT14 De->En and WMT14 Fr->En datasets.

* Accepted to TALLIP journal 
Viaarxiv icon

Reasoning before Responding: Integrating Commonsense-based Causality Explanation for Empathetic Response Generation

Jul 28, 2023
Yahui Fu, Koji Inoue, Chenhui Chu, Tatsuya Kawahara

Figure 1 for Reasoning before Responding: Integrating Commonsense-based Causality Explanation for Empathetic Response Generation
Figure 2 for Reasoning before Responding: Integrating Commonsense-based Causality Explanation for Empathetic Response Generation
Figure 3 for Reasoning before Responding: Integrating Commonsense-based Causality Explanation for Empathetic Response Generation
Figure 4 for Reasoning before Responding: Integrating Commonsense-based Causality Explanation for Empathetic Response Generation

Recent approaches to empathetic response generation try to incorporate commonsense knowledge or reasoning about the causes of emotions to better understand the user's experiences and feelings. However, these approaches mainly focus on understanding the causalities of context from the user's perspective, ignoring the system's perspective. In this paper, we propose a commonsense-based causality explanation approach for diverse empathetic response generation that considers both the user's perspective (user's desires and reactions) and the system's perspective (system's intentions and reactions). We enhance ChatGPT's ability to reason for the system's perspective by integrating in-context learning with commonsense knowledge. Then, we integrate the commonsense-based causality explanation with both ChatGPT and a T5-based model. Experimental evaluations demonstrate that our method outperforms other comparable methods on both automatic and human evaluations.

Viaarxiv icon

Towards Speech Dialogue Translation Mediating Speakers of Different Languages

May 22, 2023
Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi

Figure 1 for Towards Speech Dialogue Translation Mediating Speakers of Different Languages
Figure 2 for Towards Speech Dialogue Translation Mediating Speakers of Different Languages
Figure 3 for Towards Speech Dialogue Translation Mediating Speakers of Different Languages
Figure 4 for Towards Speech Dialogue Translation Mediating Speakers of Different Languages

We present a new task, speech dialogue translation mediating speakers of different languages. We construct the SpeechBSD dataset for the task and conduct baseline experiments. Furthermore, we consider context to be an important aspect that needs to be addressed in this task and propose two ways of utilizing context, namely monolingual context and bilingual context. We conduct cascaded speech translation experiments using Whisper and mBART, and show that bilingual context performs better in our settings.

* 11 pages, 4 figures. Accepted to ACL 2023 Findings. Dataset: https://github.com/ku-nlp/speechBSD 
Viaarxiv icon

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

May 17, 2023
Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi

Figure 1 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Figure 2 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Figure 3 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Figure 4 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

The language-independency of encoded representations within multilingual neural machine translation (MNMT) models is crucial for their generalization ability on zero-shot translation. Neural interlingua representations have been shown as an effective method for achieving this. However, fixed-length neural interlingua representations introduced in previous work can limit its flexibility and representation ability. In this study, we introduce a novel method to enhance neural interlingua representations by making their length variable, thereby overcoming the constraint of fixed-length neural interlingua representations. Our empirical results on zero-shot translation on OPUS, IWSLT, and Europarl datasets demonstrate stable model convergence and superior zero-shot translation results compared to fixed-length neural interlingua representations. However, our analysis reveals the suboptimal efficacy of our approach in translating from certain source languages, wherein we pinpoint the defective model component in our proposed method.

* Accepted to Multi3Generation workshop (held in conjunction with EAMT 2023) 
Viaarxiv icon

Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

May 16, 2023
Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, Sadao Kurohashi

Figure 1 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Figure 2 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Figure 3 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Figure 4 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation (ZST). Recent efforts for ZST often utilize the Transformer architecture as the backbone, with LayerNorm at the input of layers (PreNorm) set as the default. However, Xu et al. (2019) has revealed that PreNorm carries the risk of overfitting the training data. Based on this, we hypothesize that PreNorm may overfit supervised directions and thus have low generalizability for ZST. Through experiments on OPUS, IWSLT, and Europarl datasets for 54 ZST directions, we demonstrate that the original Transformer setting of LayerNorm after residual connections (PostNorm) consistently outperforms PreNorm by up to 12.3 BLEU points. We then study the performance disparities by analyzing the differences in off-target rates and structural variations between PreNorm and PostNorm. This study highlights the need for careful consideration of the LayerNorm setting for ZST.

* Accepted to ACL 2023 main conference 
Viaarxiv icon

Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks

Aug 23, 2022
Tianwei Chen, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Hajime Nagahara

Figure 1 for Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Figure 2 for Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Figure 3 for Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Figure 4 for Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks

Is more data always better to train vision-and-language models? We study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks their overall performance will improve. However, we show that not all the knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conduct an exhaustive analysis based on hundreds of cross-experiments on 12 vision-and-language tasks categorized in 4 groups. Whereas tasks in the same group are prone to improve each other, results show that this is not always the case. Other factors such as dataset size or pre-training stage have also a great impact on how well the knowledge is transferred.

Viaarxiv icon

EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning

May 31, 2022
Zhuoyuan Mao, Chenhui Chu, Sadao Kurohashi

Figure 1 for EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning
Figure 2 for EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning
Figure 3 for EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning
Figure 4 for EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning

Massively multilingual sentence representation models, e.g., LASER, SBERT-distill, and LaBSE, help significantly improve cross-lingual downstream tasks. However, multiple training procedures, the use of a large amount of data, or inefficient model architectures result in heavy computation to train a new model according to our preferred languages and domains. To resolve this issue, we introduce efficient and effective massively multilingual sentence representation learning (EMS), using cross-lingual sentence reconstruction (XTR) and sentence-level contrastive learning as training objectives. Compared with related studies, the proposed model can be efficiently trained using significantly fewer parallel sentences and GPU computation resources without depending on large-scale pre-trained models. Empirical results show that the proposed model significantly yields better or comparable results with regard to bi-text mining, zero-shot cross-lingual genre classification, and sentiment classification. Ablative analyses demonstrate the effectiveness of each component of the proposed model. We release the codes for model training and the EMS pre-trained model, which supports 62 languages (https://github.com/Mao-KU/EMS).

* This work is an extension of arXiv:2105.13856. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 
Viaarxiv icon

When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?

Apr 26, 2022
Zhuoyuan Mao, Chenhui Chu, Raj Dabre, Haiyue Song, Zhen Wan, Sadao Kurohashi

Figure 1 for When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?
Figure 2 for When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?
Figure 3 for When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?
Figure 4 for When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?

Word alignment has proven to benefit many-to-many neural machine translation (NMT). However, high-quality ground-truth bilingual dictionaries were used for pre-editing in previous methods, which are unavailable for most language pairs. Meanwhile, the contrastive objective can implicitly utilize automatically learned word alignment, which has not been explored in many-to-many NMT. This work proposes a word-level contrastive objective to leverage word alignments for many-to-many NMT. Empirical results show that this leads to 0.8 BLEU gains for several language pairs. Analyses reveal that in many-to-many NMT, the encoder's sentence retrieval performance highly correlates with the translation quality, which explains when the proposed method impacts translation. This motivates future exploration for many-to-many NMT to improve the encoder's sentence retrieval performance.

* NAACL 2022 findings 
Viaarxiv icon

Fusion of Self-supervised Learned Models for MOS Prediction

Apr 11, 2022
Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao

Figure 1 for Fusion of Self-supervised Learned Models for MOS Prediction
Figure 2 for Fusion of Self-supervised Learned Models for MOS Prediction
Figure 3 for Fusion of Self-supervised Learned Models for MOS Prediction
Figure 4 for Fusion of Self-supervised Learned Models for MOS Prediction

We participated in the mean opinion score (MOS) prediction challenge, 2022. This challenge aims to predict MOS scores of synthetic speech on two tracks, the main track and a more challenging sub-track: out-of-domain (OOD). To improve the accuracy of the predicted scores, we have explored several model fusion-related strategies and proposed a fused framework in which seven pretrained self-supervised learned (SSL) models have been engaged. These pretrained SSL models are derived from three ASR frameworks, including Wav2Vec, Hubert, and WavLM. For the OOD track, we followed the 7 SSL models selected on the main track and adopted a semi-supervised learning method to exploit the unlabeled data. According to the official analysis results, our system has achieved 1st rank in 6 out of 16 metrics and is one of the top 3 systems for 13 out of 16 metrics. Specifically, we have achieved the highest LCC, SRCC, and KTAU scores at the system level on main track, as well as the best performance on the LCC, SRCC, and KTAU evaluation metrics at the utterance level on OOD track. Compared with the basic SSL models, the prediction accuracy of the fused system has been largely improved, especially on OOD sub-track.

* MOS 2022 shared task system description paper 
Viaarxiv icon

VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation

Jan 21, 2022
Yihang Li, Shuichiro Shimizu, Weiqi Gu, Chenhui Chu, Sadao Kurohashi

Figure 1 for VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation
Figure 2 for VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation
Figure 3 for VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation
Figure 4 for VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation

Existing multimodal machine translation (MMT) datasets consist of images and video captions or general subtitles, which rarely contain linguistic ambiguity, making visual information not so effective to generate appropriate translations. We introduce VISA, a new dataset that consists of 40k Japanese-English parallel sentence pairs and corresponding video clips with the following key features: (1) the parallel sentences are subtitles from movies and TV episodes; (2) the source subtitles are ambiguous, which means they have multiple possible translations with different meanings; (3) we divide the dataset into Polysemy and Omission according to the cause of ambiguity. We show that VISA is challenging for the latest MMT system, and we hope that the dataset can facilitate MMT research.

* Submitted to LREC2022 
Viaarxiv icon