Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ngoc Thang Vu

BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Mar 16, 2022

Manuel Mager, Arturo Oncevay, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu

Figure 1 for BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Figure 2 for BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Figure 3 for BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Figure 4 for BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Abstract:Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically inspired segmentation methods against Byte-Pair Encodings (BPEs) as inputs for machine translation (MT) when translating to and from Spanish. We show that for all language pairs except for Nahuatl, an unsupervised morphological segmentation algorithm outperforms BPEs consistently and that, although supervised methods achieve better segmentation scores, they under-perform in MT challenges. Finally, we contribute two new morphological segmentation datasets for Raramuri and Shipibo-Konibo, and a parallel corpus for Raramuri--Spanish.

* Accepted to Findings of ACL 2022

Via

Access Paper or Ask Questions

Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features

Mar 07, 2022

Florian Lux, Ngoc Thang Vu

Figure 1 for Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features

Figure 2 for Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features

Figure 3 for Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features

Figure 4 for Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features

Abstract:While neural text-to-speech systems perform remarkably well in high-resource scenarios, they cannot be applied to the majority of the over 6,000 spoken languages in the world due to a lack of appropriate training data. In this work, we use embeddings derived from articulatory vectors rather than embeddings derived from phoneme identities to learn phoneme representations that hold across languages. In conjunction with language agnostic meta learning, this enables us to fine-tune a high-quality text-to-speech model on just 30 minutes of data in a previously unseen language spoken by a previously unseen speaker.

* Accepted for the ACL 2022 main conference

Via

Access Paper or Ask Questions

Human Interpretation of Saliency-based Explanation Over Text

Jan 27, 2022

Hendrik Schuff, Alon Jacovi, Heike Adel, Yoav Goldberg, Ngoc Thang Vu

Figure 1 for Human Interpretation of Saliency-based Explanation Over Text

Figure 2 for Human Interpretation of Saliency-based Explanation Over Text

Figure 3 for Human Interpretation of Saliency-based Explanation Over Text

Figure 4 for Human Interpretation of Saliency-based Explanation Over Text

Abstract:While a lot of research in explainable AI focuses on producing effective explanations, less work is devoted to the question of how people understand and interpret the explanation. In this work, we focus on this question through a study of saliency-based explanations over textual data. Feature-attribution explanations of text models aim to communicate which parts of the input text were more influential than others towards the model decision. Many current explanation methods, such as gradient-based or Shapley value-based methods, provide measures of importance which are well-understood mathematically. But how does a person receiving the explanation (the explainee) comprehend it? And does their understanding match what the explanation attempted to communicate? We empirically investigate the effect of various factors of the input, the feature-attribution explanation, and visualization procedure, on laypeople's interpretation of the explanation. We query crowdworkers for their interpretation on tasks in English and German, and fit a GAMM model to their responses considering the factors of interest. We find that people often mis-interpret the explanations: superficial and unrelated factors, such as word length, influence the explainees' importance assignment despite the explanation communicating importance directly. We then show that some of this distortion can be attenuated: we propose a method to adjust saliencies based on model estimates of over- and under-perception, and explore bar charts as an alternative to heatmap saliency visualization. We find that both approaches can attenuate the distorting effect of specific factors, leading to better-calibrated understanding of the explanation.

Via

Access Paper or Ask Questions

Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching

Dec 19, 2021

Chia-Yu Li, Ngoc Thang Vu

Figure 1 for Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching

Figure 2 for Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching

Figure 3 for Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching

Figure 4 for Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching

Abstract:Code-Switching (CS) is a common linguistic phenomenon in multilingual communities that consists of switching between languages while speaking. This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech. We analyse different CS specific issues such as the properties mismatches between languages in a CS language pair, the unpredictable nature of switching points, and the data scarcity problem. We exploit and improve the state-of-the-art end-to-end system by merging nonlinguistic symbols, by integrating language identification using hierarchical softmax, by modeling sub-word units, by artificially lowering the speaking rate, and by augmenting data using speed perturbed technique and several monolingual datasets to improve the final performance not only on CS speech but also on monolingual benchmarks in order to make the system more applicable on real life settings. Finally, we explore the effect of different language model integration methods on the performance of the proposed model. Our experimental results reveal that all the proposed techniques improve the recognition performance. The best combined system improves the baseline system by up to 35% relatively in terms of mixed error rate and delivers acceptable performance on monolingual benchmarks.

* The 2019 International Conference on Asian Language Processing (IALP)

Via

Access Paper or Ask Questions

Investigation of Densely Connected Convolutional Networks with Domain Adversarial Learning for Noise Robust Speech Recognition

Dec 19, 2021

Chia Yu Li, Ngoc Thang Vu

Figure 1 for Investigation of Densely Connected Convolutional Networks with Domain Adversarial Learning for Noise Robust Speech Recognition

Figure 2 for Investigation of Densely Connected Convolutional Networks with Domain Adversarial Learning for Noise Robust Speech Recognition

Figure 3 for Investigation of Densely Connected Convolutional Networks with Domain Adversarial Learning for Noise Robust Speech Recognition

Figure 4 for Investigation of Densely Connected Convolutional Networks with Domain Adversarial Learning for Noise Robust Speech Recognition

Abstract:We investigate densely connected convolutional networks (DenseNets) and their extension with domain adversarial training for noise robust speech recognition. DenseNets are very deep, compact convolutional neural networks which have demonstrated incredible improvements over the state-of-the-art results in computer vision. Our experimental results reveal that DenseNets are more robust against noise than other neural network based models such as deep feed forward neural networks and convolutional neural networks. Moreover, domain adversarial learning can further improve the robustness of DenseNets against both, known and unknown noise conditions.

* 7 pages, 5 figures, The 30th Conference on Electronic Speech Signal Processing (ESSV2019)

Via

Access Paper or Ask Questions

Predicting User Code-Switching Level from Sociological and Psychological Profiles

Dec 13, 2021

Injy Hamed, Alia El Bolock, Nader Rizk, Cornelia Herbert, Slim Abdennadher, Ngoc Thang Vu

Figure 1 for Predicting User Code-Switching Level from Sociological and Psychological Profiles

Figure 2 for Predicting User Code-Switching Level from Sociological and Psychological Profiles

Figure 3 for Predicting User Code-Switching Level from Sociological and Psychological Profiles

Figure 4 for Predicting User Code-Switching Level from Sociological and Psychological Profiles

Abstract:Multilingual speakers tend to alternate between languages within a conversation, a phenomenon referred to as "code-switching" (CS). CS is a complex phenomenon that not only encompasses linguistic challenges, but also contains a great deal of complexity in terms of its dynamic behaviour across speakers. This dynamic behaviour has been studied by sociologists and psychologists, identifying factors affecting CS. In this paper, we provide an empirical user study on Arabic-English CS, where we show the correlation between users' CS frequency and character traits. We use machine learning (ML) to validate the findings, informing and confirming existing theories. The predictive models were able to predict users' CS frequency with an accuracy higher than 55%, where travel experiences and personality traits played the biggest role in the modeling process.

* To be published in the proceedings of the International Conference on Asian Language Information Processing

Via

Access Paper or Ask Questions

Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

Dec 12, 2021

Chia-Yu Li, Ngoc Thang Vu

Figure 1 for Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

Figure 2 for Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

Figure 3 for Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

Figure 4 for Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

Abstract:This paper presents our latest effort on improving Code-switching language models that suffer from data scarcity. We investigate methods to augment Code-switching training text data by artificially generating them. Concretely, we propose a cycle-consistent adversarial networks based framework to transfer monolingual text into Code-switching text, considering Code-switching as a speaking style. Our experimental results on the SEAME corpus show that utilising artificially generated Code-switching text data improves consistently the language model as well as the automatic speech recognition performance.

* 4 pages, 1 figure, Interspeech 2020

Via

Access Paper or Ask Questions

Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN

Dec 12, 2021

Chia-Yu Li, Ngoc Thang Vu

Figure 1 for Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN

Figure 2 for Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN

Figure 3 for Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN

Figure 4 for Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN

Abstract:This paper presents our latest investigations on improving automatic speech recognition for noisy speech via speech enhancement. We propose a novel method named Multi-discriminators CycleGAN to reduce noise of input speech and therefore improve the automatic speech recognition performance. Our proposed method leverages the CycleGAN framework for speech enhancement without any parallel data and improve it by introducing multiple discriminators that check different frequency areas. Furthermore, we show that training multiple generators on homogeneous subset of the training data is better than training one generator on all the training data. We evaluate our method on CHiME-3 data set and observe up to 10.03% relatively WER improvement on the development set and up to 14.09% on the evaluation set.

* 6 pages, 9 figures, ASRU 2021

Via

Access Paper or Ask Questions

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Nov 29, 2021

Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan(+3 more)

Figure 1 for ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Figure 2 for ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Figure 3 for ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Figure 4 for ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Abstract:As Automatic Speech Processing (ASR) systems are getting better, there is an increasing interest of using the ASR output to do downstream Natural Language Processing (NLP) tasks. However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks. Hence, there is a need to build an open source standard that can be used to have a faster start into SLU research. We present ESPnet-SLU, which is designed for quick development of spoken language understanding in a single framework. ESPnet-SLU is a project inside end-to-end speech processing toolkit, ESPnet, which is a widely used open-source standard for various speech processing tasks like ASR, Text to Speech (TTS) and Speech Translation (ST). We enhance the toolkit to provide implementations for various SLU benchmarks that enable researchers to seamlessly mix-and-match different ASR and NLU models. We also provide pretrained models with intensively tuned hyper-parameters that can match or even outperform the current state-of-the-art performances. The toolkit is publicly available at https://github.com/espnet/espnet.

* Submitted to ICASSP 2022 (5 pages)

Via

Access Paper or Ask Questions

Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings

Oct 13, 2021

Hendrik Schuff, Hsiu-Yu Yang, Heike Adel, Ngoc Thang Vu

Figure 1 for Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings

Figure 2 for Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings

Figure 3 for Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings

Figure 4 for Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings

Abstract:Natural language inference (NLI) requires models to learn and apply commonsense knowledge. These reasoning abilities are particularly important for explainable NLI systems that generate a natural language explanation in addition to their label prediction. The integration of external knowledge has been shown to improve NLI systems, here we investigate whether it can also improve their explanation capabilities. For this, we investigate different sources of external knowledge and evaluate the performance of our models on in-domain data as well as on special transfer datasets that are designed to assess fine-grained reasoning capabilities. We find that different sources of knowledge have a different effect on reasoning abilities, for example, implicit knowledge stored in language models can hinder reasoning on numbers and negations. Finally, we conduct the largest and most fine-grained explainable NLI crowdsourcing study to date. It reveals that even large differences in automatic performance scores do neither reflect in human ratings of label, explanation, commonsense nor grammar correctness.

* BlackboxNLP @ EMNLP2021

Via

Access Paper or Ask Questions