Alert button
Picture for Shuyue Stella Li

Shuyue Stella Li

Alert button

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

May 31, 2023
Shuyue Stella Li, Cihan Xiao, Tianjian Li, Bismarck Odoom

Figure 1 for Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning
Figure 2 for Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning
Figure 3 for Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning
Figure 4 for Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Code-switching, also called code-mixing, is the linguistics phenomenon where in casual settings, multilingual speakers mix words from different languages in one utterance. Due to its spontaneous nature, code-switching is extremely low-resource, which makes it a challenging problem for language and speech processing tasks. In such contexts, Code-Switching Language Identification (CSLID) becomes a difficult but necessary task if we want to maximally leverage existing monolingual tools for other tasks. In this work, we propose two novel approaches toward improving language identification accuracy on an English-Mandarin child-directed speech dataset. Our methods include a stacked Residual CNN+GRU model and a multitask pre-training approach to use Automatic Speech Recognition (ASR) as an auxiliary task for CSLID. Due to the low-resource nature of code-switching, we also employ careful silver data creation using monolingual corpora in both languages and up-sampling as data augmentation. We focus on English-Mandarin code-switched data, but our method works on any language pair. Our best model achieves a balanced accuracy of 0.781 on a real English-Mandarin code-switching child-directed speech corpus and outperforms the previous baseline by 55.3%.

* 8 pages, 3 figures, 7 tables 
Viaarxiv icon

Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

May 23, 2023
Haoran Xu, Weiting Tan, Shuyue Stella Li, Yunmo Chen, Benjamin Van Durme, Philipp Koehn, Kenton Murray

Figure 1 for Condensing Multilingual Knowledge with Lightweight Language-Specific Modules
Figure 2 for Condensing Multilingual Knowledge with Lightweight Language-Specific Modules
Figure 3 for Condensing Multilingual Knowledge with Lightweight Language-Specific Modules
Figure 4 for Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

Incorporating language-specific (LS) modules is a proven method to boost performance in multilingual machine translation. This approach bears similarity to Mixture-of-Experts (MoE) because it does not inflate FLOPs. However, the scalability of this approach to hundreds of languages (experts) tends to be unmanageable due to the prohibitive number of parameters introduced by full-rank matrices in fully-connected layers. In this work, we introduce the Language-Specific Matrix Synthesis (LMS) method. This approach constructs LS modules by generating low-rank matrices from two significantly smaller matrices to approximate the full-rank matrix. Furthermore, we condense multilingual knowledge from multiple LS modules into a single shared module with the Fuse Distillation (FD) technique to improve the efficiency of inference and model serialization. We show that our LMS method significantly outperforms previous LS methods and MoE methods with the same amount of extra parameters, e.g., 1.73 BLEU points over the Switch Transformer on many-to-many multilingual machine translation. Importantly, LMS is able to have comparable translation performance with much fewer parameters.

Viaarxiv icon

Language Agnostic Code-Mixing Data Augmentation by Predicting Linguistic Patterns

Nov 14, 2022
Shuyue Stella Li, Kenton Murray

Figure 1 for Language Agnostic Code-Mixing Data Augmentation by Predicting Linguistic Patterns
Figure 2 for Language Agnostic Code-Mixing Data Augmentation by Predicting Linguistic Patterns
Figure 3 for Language Agnostic Code-Mixing Data Augmentation by Predicting Linguistic Patterns
Figure 4 for Language Agnostic Code-Mixing Data Augmentation by Predicting Linguistic Patterns

In this work, we focus on intrasentential code-mixing and propose several different Synthetic Code-Mixing (SCM) data augmentation methods that outperform the baseline on downstream sentiment analysis tasks across various amounts of labeled gold data. Most importantly, our proposed methods demonstrate that strategically replacing parts of sentences in the matrix language with a constant mask significantly improves classification accuracy, motivating further linguistic insights into the phenomenon of code-mixing. We test our data augmentation method in a variety of low-resource and cross-lingual settings, reaching up to a relative improvement of 7.73% on the extremely scarce English-Malayalam dataset. We conclude that the code-switch pattern in code-mixing sentences is also important for the model to learn. Finally, we propose a language-agnostic SCM algorithm that is cheap yet extremely helpful for low-resource languages.

* 12 pages, 5 figures 
Viaarxiv icon

Q-LSTM Language Model -- Decentralized Quantum Multilingual Pre-Trained Language Model for Privacy Protection

Oct 06, 2022
Shuyue Stella Li, Xiangyu Zhang, Shu Zhou, Hongchao Shu, Ruixing Liang, Hexin Liu, Leibny Paola Garcia

Figure 1 for Q-LSTM Language Model -- Decentralized Quantum Multilingual Pre-Trained Language Model for Privacy Protection
Figure 2 for Q-LSTM Language Model -- Decentralized Quantum Multilingual Pre-Trained Language Model for Privacy Protection
Figure 3 for Q-LSTM Language Model -- Decentralized Quantum Multilingual Pre-Trained Language Model for Privacy Protection
Figure 4 for Q-LSTM Language Model -- Decentralized Quantum Multilingual Pre-Trained Language Model for Privacy Protection

Large-scale language models are trained on a massive amount of natural language data that might encode or reflect our private information. With careful manipulation, malicious agents can reverse engineer the training data even if data sanitation and differential privacy algorithms were involved in the pre-training process. In this work, we propose a decentralized training framework to address privacy concerns in training large-scale language models. The framework consists of a cloud quantum language model built with Variational Quantum Classifiers (VQC) for sentence embedding and a local Long-Short Term Memory (LSTM) model. We use both intrinsic evaluation (loss, perplexity) and extrinsic evaluation (downstream sentiment analysis task) to evaluate the performance of our quantum language model. Our quantum model was comparable to its classical counterpart on all the above metrics. We also perform ablation studies to look into the effect of the size of VQC and the size of training data on the performance of the model. Our approach solves privacy concerns without sacrificing downstream task performance. The intractability of quantum operations on classical hardware ensures the confidentiality of the training data and makes it impossible to be recovered by any adversary.

* 5 pages, 3 figures, 3 tables 
Viaarxiv icon

Investigating self-supervised learning for lyrics recognition

Sep 28, 2022
Xiangyu Zhang, Zhanhong He, Shuyue Stella Li, Roberto Togneri, Leibny Paola Garcia

Figure 1 for Investigating self-supervised learning for lyrics recognition
Figure 2 for Investigating self-supervised learning for lyrics recognition
Figure 3 for Investigating self-supervised learning for lyrics recognition
Figure 4 for Investigating self-supervised learning for lyrics recognition

Lyrics recognition is an important task in music processing. Despite the great number of traditional algorithms such as the hybrid HMM-TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models. We evaluate four upstream SSL models based on their training method (masked reconstruction, masked prediction, autoregressive reconstruction, contrastive model). After applying the SSL model, the best performance improved by 5.23% for the dev set and 2.4% for the test set compared with the previous state-of-art baseline system even without a language model trained by a large corpus. Moreover, we study the generalization ability of the SSL features considering that those models were not trained on music datasets.

Viaarxiv icon

Genetic Improvement in the Shackleton Framework for Optimizing LLVM Pass Sequences

Apr 28, 2022
Shuyue Stella Li, Hannah Peeler, Andrew N. Sloss, Kenneth N. Reid, Wolfgang Banzhaf

Figure 1 for Genetic Improvement in the Shackleton Framework for Optimizing LLVM Pass Sequences
Figure 2 for Genetic Improvement in the Shackleton Framework for Optimizing LLVM Pass Sequences

Genetic improvement is a search technique that aims to improve a given acceptable solution to a problem. In this paper, we present the novel use of genetic improvement to find problem-specific optimized LLVM pass sequences. We develop a pass-level patch representation in the linear genetic programming framework, Shackleton, to evolve the modifications to be applied to the default optimization pass sequences. Our GI-evolved solution has a mean of 3.7% runtime improvement compared to the -O3 optimization level in the default code generation options which optimizes on runtime. The proposed GI method provides an automatic way to find a problem-specific optimization sequence that improves upon a general solution without any expert domain knowledge. In this paper, we discuss the advantages and limitations of the GI feature in the Shackleton Framework and present our results.

* 3 pages, 2 figures 
Viaarxiv icon

Optimizing LLVM Pass Sequences with Shackleton: A Linear Genetic Programming Framework

Jan 31, 2022
Hannah Peeler, Shuyue Stella Li, Andrew N. Sloss, Kenneth N. Reid, Yuan Yuan, Wolfgang Banzhaf

Figure 1 for Optimizing LLVM Pass Sequences with Shackleton: A Linear Genetic Programming Framework
Figure 2 for Optimizing LLVM Pass Sequences with Shackleton: A Linear Genetic Programming Framework
Figure 3 for Optimizing LLVM Pass Sequences with Shackleton: A Linear Genetic Programming Framework
Figure 4 for Optimizing LLVM Pass Sequences with Shackleton: A Linear Genetic Programming Framework

In this paper we introduce Shackleton as a generalized framework enabling the application of linear genetic programming -- a technique under the umbrella of evolutionary algorithms -- to a variety of use cases. We also explore here a novel application for this class of methods: optimizing sequences of LLVM optimization passes. The algorithm underpinning Shackleton is discussed, with an emphasis on the effects of different features unique to the framework when applied to LLVM pass sequences. Combined with analysis of different hyperparameter settings, we report the results on automatically optimizing pass sequences using Shackleton for two software applications at differing complexity levels. Finally, we reflect on the advantages and limitations of our current implementation and lay out a path for further improvements. These improvements aim to surpass hand-crafted solutions with an automatic discovery method for an optimal pass sequence.

* 11 pages (with references), 14 figures, 8 tables 
Viaarxiv icon