Alert button
Picture for Sadao Kurohashi

Sadao Kurohashi

Alert button

SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation

Jul 31, 2023
Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi, Eiichiro Sumita

Figure 1 for SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
Figure 2 for SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
Figure 3 for SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
Figure 4 for SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation

Sub-word segmentation is an essential pre-processing step for Neural Machine Translation (NMT). Existing work has shown that neural sub-word segmenters are better than Byte-Pair Encoding (BPE), however, they are inefficient as they require parallel corpora, days to train and hours to decode. This paper introduces SelfSeg, a self-supervised neural sub-word segmentation method that is much faster to train/decode and requires only monolingual dictionaries instead of parallel corpora. SelfSeg takes as input a word in the form of a partially masked character sequence, optimizes the word generation probability and generates the segmentation with the maximum posterior probability, which is calculated using a dynamic programming algorithm. The training time of SelfSeg depends on word frequencies, and we explore several word frequency normalization strategies to accelerate the training phase. Additionally, we propose a regularization mechanism that allows the segmenter to generate various segmentations for one word. To show the effectiveness of our approach, we conduct MT experiments in low-, middle- and high-resource scenarios, where we compare the performance of using different segmentation methods. The experimental results demonstrate that on the low-resource ALT dataset, our method achieves more than 1.2 BLEU score improvement compared with BPE and SentencePiece, and a 1.1 score improvement over Dynamic Programming Encoding (DPE) and Vocabulary Learning via Optimal Transport (VOLT) on average. The regularization method achieves approximately a 4.3 BLEU score improvement over BPE and a 1.2 BLEU score improvement over BPE-dropout, the regularized version of BPE. We also observed significant improvements on IWSLT15 Vi->En, WMT16 Ro->En and WMT15 Fi->En datasets, and competitive results on the WMT14 De->En and WMT14 Fr->En datasets.

* Accepted to TALLIP journal 
Viaarxiv icon

MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting

May 26, 2023
Tatsuro Inaba, Hirokazu Kiyomaru, Fei Cheng, Sadao Kurohashi

Figure 1 for MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting
Figure 2 for MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting
Figure 3 for MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting
Figure 4 for MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting

Large language models (LLMs) have achieved impressive performance on various reasoning tasks. To further improve the performance, we propose MultiTool-CoT, a novel framework that leverages chain-of-thought (CoT) prompting to incorporate multiple external tools, such as a calculator and a knowledge retriever, during the reasoning process. We apply MultiTool-CoT to the Task 2 dataset of NumGLUE, which requires both numerical reasoning and domain-specific knowledge. The experiments show that our method significantly outperforms strong baselines and achieves state-of-the-art performance.

* ACL2023. Our code is available at https://github.com/InabaTatsuro/MultiTool-CoT 
Viaarxiv icon

Towards Speech Dialogue Translation Mediating Speakers of Different Languages

May 22, 2023
Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi

Figure 1 for Towards Speech Dialogue Translation Mediating Speakers of Different Languages
Figure 2 for Towards Speech Dialogue Translation Mediating Speakers of Different Languages
Figure 3 for Towards Speech Dialogue Translation Mediating Speakers of Different Languages
Figure 4 for Towards Speech Dialogue Translation Mediating Speakers of Different Languages

We present a new task, speech dialogue translation mediating speakers of different languages. We construct the SpeechBSD dataset for the task and conduct baseline experiments. Furthermore, we consider context to be an important aspect that needs to be addressed in this task and propose two ways of utilizing context, namely monolingual context and bilingual context. We conduct cascaded speech translation experiments using Whisper and mBART, and show that bilingual context performs better in our settings.

* 11 pages, 4 figures. Accepted to ACL 2023 Findings. Dataset: https://github.com/ku-nlp/speechBSD 
Viaarxiv icon

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

May 17, 2023
Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi

Figure 1 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Figure 2 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Figure 3 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Figure 4 for Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

The language-independency of encoded representations within multilingual neural machine translation (MNMT) models is crucial for their generalization ability on zero-shot translation. Neural interlingua representations have been shown as an effective method for achieving this. However, fixed-length neural interlingua representations introduced in previous work can limit its flexibility and representation ability. In this study, we introduce a novel method to enhance neural interlingua representations by making their length variable, thereby overcoming the constraint of fixed-length neural interlingua representations. Our empirical results on zero-shot translation on OPUS, IWSLT, and Europarl datasets demonstrate stable model convergence and superior zero-shot translation results compared to fixed-length neural interlingua representations. However, our analysis reveals the suboptimal efficacy of our approach in translating from certain source languages, wherein we pinpoint the defective model component in our proposed method.

* Accepted to Multi3Generation workshop (held in conjunction with EAMT 2023) 
Viaarxiv icon

Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

May 16, 2023
Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, Sadao Kurohashi

Figure 1 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Figure 2 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Figure 3 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Figure 4 for Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation (ZST). Recent efforts for ZST often utilize the Transformer architecture as the backbone, with LayerNorm at the input of layers (PreNorm) set as the default. However, Xu et al. (2019) has revealed that PreNorm carries the risk of overfitting the training data. Based on this, we hypothesize that PreNorm may overfit supervised directions and thus have low generalizability for ZST. Through experiments on OPUS, IWSLT, and Europarl datasets for 54 ZST directions, we demonstrate that the original Transformer setting of LayerNorm after residual connections (PostNorm) consistently outperforms PreNorm by up to 12.3 BLEU points. We then study the performance disparities by analyzing the differences in off-target rates and structural variations between PreNorm and PostNorm. This study highlights the need for careful consideration of the LayerNorm setting for ZST.

* Accepted to ACL 2023 main conference 
Viaarxiv icon

SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation

May 15, 2023
Junfeng Jiang, Chengzhang Dong, Akiko Aizawa, Sadao Kurohashi

Figure 1 for SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation
Figure 2 for SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation
Figure 3 for SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation
Figure 4 for SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation

Dialogue segmentation is a crucial task for dialogue systems allowing a better understanding of conversational texts. Despite recent progress in unsupervised dialogue segmentation methods, their performances are limited by the lack of explicit supervised signals for training. Furthermore, the precise definition of segmentation points in conversations still remains as a challenging problem, increasing the difficulty of collecting manual annotations. In this paper, we provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues and release a large-scale supervised dataset called SuperDialseg, containing 9K dialogues based on two prevalent document-grounded dialogue corpora, and also inherit their useful dialogue-related annotations. Moreover, we propose two models to exploit the dialogue characteristics, achieving state-of-the-art performance on SuperDialseg and showing good generalization ability on the out-of-domain datasets. Additionally, we provide a benchmark including 20 models across four categories for the dialogue segmentation task with several proper evaluation metrics. Based on the analysis of the empirical studies, we also provide some insights for the task of dialogue segmentation. We believe our work is an important step forward in the field of dialogue segmentation.

* Datasets and codes are available at https://github.com/Coldog2333/SuperDialseg 
Viaarxiv icon

Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning

May 12, 2023
Qianying Liu, Dongsheng Yang, Wenjie Zhong, Fei Cheng, Sadao Kurohashi

Figure 1 for Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning
Figure 2 for Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning
Figure 3 for Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning
Figure 4 for Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning

Numerical reasoning over table-and-text hybrid passages, such as financial reports, poses significant challenges and has numerous potential applications. Noise and irrelevant variables in the model input have been a hindrance to its performance. Additionally, coarse-grained supervision of the whole solution program has impeded the model's ability to learn the underlying numerical reasoning process. In this paper, we propose three pretraining tasks that operate at both the whole program and sub-program level: Variable Integrity Ranking, which guides the model to focus on useful variables; Variable Operator Prediction, which decomposes the supervision into fine-grained single operator prediction; and Variable Keyphrase Masking, which encourages the model to identify key evidence that sub-programs are derived from. Experimental results demonstrate the effectiveness of our proposed methods, surpassing transformer-based model baselines.

* 11 pages 
Viaarxiv icon

GPT-RE: In-context Learning for Relation Extraction using Large Language Models

May 03, 2023
Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi

Figure 1 for GPT-RE: In-context Learning for Relation Extraction using Large Language Models
Figure 2 for GPT-RE: In-context Learning for Relation Extraction using Large Language Models
Figure 3 for GPT-RE: In-context Learning for Relation Extraction using Large Language Models
Figure 4 for GPT-RE: In-context Learning for Relation Extraction using Large Language Models

In spite of the potential for ground-breaking achievements offered by large language models (LLMs) (e.g., GPT-3), they still lag significantly behind fully-supervised baselines (e.g., fine-tuned BERT) in relation extraction (RE). This is due to the two major shortcomings of LLMs in RE: (1) low relevance regarding entity and relation in retrieved demonstrations for in-context learning; and (2) the strong inclination to wrongly classify NULL examples into other pre-defined labels. In this paper, we propose GPT-RE to bridge the gap between LLMs and fully-supervised baselines. GPT-RE successfully addresses the aforementioned issues by (1) incorporating task-specific entity representations in demonstration retrieval; and (2) enriching the demonstrations with gold label-induced reasoning logic. We evaluate GPT-RE on four widely-used RE datasets, and observe that GPT-RE achieves improvements over not only existing GPT-3 baselines, but also fully-supervised baselines. Specifically, GPT-RE achieves SOTA performances on the Semeval and SciERC datasets, and competitive performances on the TACRED and ACE05 datasets.

Viaarxiv icon

Textual Enhanced Contrastive Learning for Solving Math Word Problems

Nov 29, 2022
Yibin Shen, Qianying Liu, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi

Figure 1 for Textual Enhanced Contrastive Learning for Solving Math Word Problems
Figure 2 for Textual Enhanced Contrastive Learning for Solving Math Word Problems
Figure 3 for Textual Enhanced Contrastive Learning for Solving Math Word Problems
Figure 4 for Textual Enhanced Contrastive Learning for Solving Math Word Problems

Solving math word problems is the task that analyses the relation of quantities and requires an accurate understanding of contextual natural language information. Recent studies show that current models rely on shallow heuristics to predict solutions and could be easily misled by small textual perturbations. To address this problem, we propose a Textual Enhanced Contrastive Learning framework, which enforces the models to distinguish semantically similar examples while holding different mathematical logic. We adopt a self-supervised manner strategy to enrich examples with subtle textual variance by textual reordering or problem re-construction. We then retrieve the hardest to differentiate samples from both equation and textual perspectives and guide the model to learn their representations. Experimental results show that our method achieves state-of-the-art on both widely used benchmark datasets and also exquisitely designed challenge datasets in English and Chinese. \footnote{Our code and data is available at \url{https://github.com/yiyunya/Textual_CL_MWP}

* Findings of EMNLP 2022 
Viaarxiv icon

ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision

Oct 13, 2022
Qianying Liu, Wenyu Guan, Jianhao Shen, Fei Cheng, Sadao Kurohashi

Figure 1 for ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision
Figure 2 for ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision
Figure 3 for ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision
Figure 4 for ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision

Previous studies have introduced a weakly-supervised paradigm for solving math word problems requiring only the answer value annotation. While these methods search for correct value equation candidates as pseudo labels, they search among a narrow sub-space of the enormous equation space. To address this problem, we propose a novel search algorithm with combinatorial strategy \textbf{ComSearch}, which can compress the search space by excluding mathematically equivalent equations. The compression allows the searching algorithm to enumerate all possible equations and obtain high-quality data. We investigate the noise in the pseudo labels that hold wrong mathematical logic, which we refer to as the \textit{false-matching} problem, and propose a ranking model to denoise the pseudo labels. Our approach holds a flexible framework to utilize two existing supervised math word problem solvers to train pseudo labels, and both achieve state-of-the-art performance in the weak supervision task.

* 13 pages 
Viaarxiv icon