Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Irina Illina

LORIA

Mixture of LoRA Experts for Low-Resourced Multi-Accent Automatic Speech Recognition

May 26, 2025

Raphaël Bagat, Irina Illina, Emmanuel Vincent

Abstract:We aim to improve the robustness of Automatic Speech Recognition (ASR) systems against non-native speech, particularly in low-resourced multi-accent settings. We introduce Mixture of Accent-Specific LoRAs (MAS-LoRA), a fine-tuning method that leverages a mixture of Low-Rank Adaptation (LoRA) experts, each specialized in a specific accent. This method can be used when the accent is known or unknown at inference time, without the need to fine-tune the model again. Our experiments, conducted using Whisper on the L2-ARCTIC corpus, demonstrate significant improvements in Word Error Rate compared to regular LoRA and full fine-tuning when the accent is unknown. When the accent is known, the results further improve. Furthermore, MAS-LoRA shows less catastrophic forgetting than the other fine-tuning methods. To the best of our knowledge, this is the first use of a mixture of LoRA experts for non-native multi-accent ASR.

* Submitted to Interspeech 2025

Via

Access Paper or Ask Questions

Lillama: Large Language Models Compression via Low-Rank Feature Distillation

Dec 28, 2024

Yaya Sy, Christophe Cerisara, Irina Illina

Figure 1 for Lillama: Large Language Models Compression via Low-Rank Feature Distillation

Figure 2 for Lillama: Large Language Models Compression via Low-Rank Feature Distillation

Figure 3 for Lillama: Large Language Models Compression via Low-Rank Feature Distillation

Figure 4 for Lillama: Large Language Models Compression via Low-Rank Feature Distillation

Abstract:Current LLM structured pruning methods typically involve two steps: (1) compression with calibration data and (2) costly continued pretraining on billions of tokens to recover lost performance. This second step is necessary as the first significantly impacts model accuracy. Prior research suggests pretrained Transformer weights aren't inherently low-rank, unlike their activations, which may explain this drop. Based on this observation, we propose Lillama, a compression method that locally distills activations with low-rank weights. Using SVD for initialization and a joint loss combining teacher and student activations, we accelerate convergence and reduce memory use with local gradient updates. Lillama compresses Mixtral-8x7B within minutes on a single A100 GPU, removing 10 billion parameters while retaining over 95% of its original performance. Phi-2 3B can be compressed by 40% with just 13 million calibration tokens, resulting in a small model that competes with recent models of similar size. The method generalizes well to non-transformer architectures, compressing Mamba-3B by 20% while maintaining 99% performance.

* 20 pages, 8 figures

Via

Access Paper or Ask Questions

Large Language Models Compression via Low-Rank Feature Distillation

Dec 21, 2024

Yaya Sy, Christophe Cerisara, Irina Illina

Figure 1 for Large Language Models Compression via Low-Rank Feature Distillation

Figure 2 for Large Language Models Compression via Low-Rank Feature Distillation

Figure 3 for Large Language Models Compression via Low-Rank Feature Distillation

Figure 4 for Large Language Models Compression via Low-Rank Feature Distillation

Abstract:Current LLM structured pruning methods involve two steps: (1) compressing with calibration data and (2) continued pretraining on billions of tokens to recover the lost performance. This costly second step is needed as the first step significantly impacts performance. Previous studies have found that pretrained Transformer weights aren't inherently low-rank, unlike their activations, which may explain this performance drop. Based on this observation, we introduce a one-shot compression method that locally distills low-rank weights. We accelerate convergence by initializing the low-rank weights with SVD and using a joint loss that combines teacher and student activations. We reduce memory requirements by applying local gradient updates only. Our approach can compress Mixtral-8x7B within minutes on a single A100 GPU, removing 10 billion parameters while maintaining over 95% of the original performance. Phi-2 3B can be compressed by 40% using only 13 million calibration tokens into a small model that competes with recent models of similar size. We show our method generalizes well to non-transformer architectures: Mamba-3B can be compressed by 20% while maintaining 99% of its performance.

* 20 pages, 8 figures

Via

Access Paper or Ask Questions

SAMbA: Speech enhancement with Asynchronous ad-hoc Microphone Arrays

Jul 31, 2023

Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

Abstract:Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array. Asynchronization comes from sampling time offset and sampling rate offset which inevitably occur when the microphones are embedded in different hardware components. In this paper, we propose a deep neural network (DNN)-based speech enhancement solution that is suited for applications in ad-hoc microphone arrays because it is distributed and copes with asynchronization. We show that asynchronization has a limited impact on the spatial filtering and mostly affects the performance of the DNNs. Instead of resynchronising the signals, which requires costly processing steps, we use an attention mechanism which makes the DNNs, thus our whole pipeline, robust to asynchronization. We also show that the attention mechanism leads to the asynchronization parameters in an unsupervised manner.

* Submitted to INTERSPEECH 2022

Via

Access Paper or Ask Questions

Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

Oct 17, 2022

Tulika Bose, Irina Illina, Dominique Fohr

Figure 1 for Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

Figure 2 for Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

Figure 3 for Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

Figure 4 for Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

Abstract:The concerning rise of hateful content on online platforms has increased the attention towards automatic hate speech detection, commonly formulated as a supervised classification task. State-of-the-art deep learning-based approaches usually require a substantial amount of labeled resources for training. However, annotating hate speech resources is expensive, time-consuming, and often harmful to the annotators. This creates a pressing need to transfer knowledge from the existing labeled resources to low-resource hate speech corpora with the goal of improving system performance. For this, neighborhood-based frameworks have been shown to be effective. However, they have limited flexibility. In our paper, we propose a novel training strategy that allows flexible modeling of the relative proximity of neighbors retrieved from a resource-rich corpus to learn the amount of transfer. In particular, we incorporate neighborhood information with Optimal Transport, which permits exploiting the geometry of the data embedding space. By aligning the joint embedding and label distributions of neighbors, we demonstrate substantial improvements over strong baselines, in low-resource scenarios, on different publicly available hate speech corpora.

* AACL-IJCNLP 2022 preprint

Via

Access Paper or Ask Questions

Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Sep 18, 2022

Tulika Bose, Nikolaos Aletras, Irina Illina, Dominique Fohr

Figure 1 for Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Figure 2 for Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Figure 3 for Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Figure 4 for Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Abstract:State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing source-specific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hate-speech from manually curated lists using feature attribution methods, which quantify the importance assigned to input terms by the classifier when making a prediction. We, instead, propose a domain adaptation approach that automatically extracts and penalizes source-specific terms using a domain classifier, which learns to differentiate between domains, and feature-attribution scores for hate-speech classes, yielding consistent improvements in cross-domain evaluation.

* COLING 2022 pre-print

Via

Access Paper or Ask Questions

Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Apr 28, 2022

Dana Ruiter, Liane Reiners, Ashwin Geet D'Sa, Thomas Kleinbauer, Dominique Fohr, Irina Illina, Dietrich Klakow, Christian Schemer, Angeliki Monnier

Figure 1 for Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Figure 2 for Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Figure 3 for Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Figure 4 for Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Abstract:Even though hate speech (HS) online has been an important object of research in the last decade, most HS-related corpora over-simplify the phenomenon of hate by attempting to label user comments as "hate" or "neutral". This ignores the complex and subjective nature of HS, which limits the real-life applicability of classifiers trained on these corpora. In this study, we present the M-Phasis corpus, a corpus of ~9k German and French user comments collected from migration-related news articles. It goes beyond the "hate"-"neutral" dichotomy and is instead annotated with 23 features, which in combination become descriptors of various types of speech, ranging from critical comments to implicit and explicit expressions of hate. The annotations are performed by 4 native speakers per language and achieve high (0.77 <= k <= 1) inter-annotator agreements. Besides describing the corpus creation and presenting insights from a content, error and domain analysis, we explore its data characteristics by training several classification baselines.

* 14 pages, 4 figures, accepted at LREC 2022 (Full Paper)

Via

Access Paper or Ask Questions

Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Mar 23, 2022

Tulika Bose, Nikolaos Aletras, Irina Illina, Dominique Fohr

Figure 1 for Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Figure 2 for Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Figure 3 for Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Figure 4 for Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Abstract:Hate speech classifiers exhibit substantial performance degradation when evaluated on datasets different from the source. This is due to learning spurious correlations between words that are not necessarily relevant to hateful language, and hate speech labels from the training corpus. Previous work has attempted to mitigate this problem by regularizing specific terms from pre-defined static dictionaries. While this has been demonstrated to improve the generalizability of classifiers, the coverage of such methods is limited and the dictionaries require regular manual updates from human experts. In this paper, we propose to automatically identify and reduce spurious correlations using attribution methods with dynamic refinement of the list of terms that need to be regularized during training. Our approach is flexible and improves the cross-corpora performance over previous work independently and in combination with pre-defined dictionaries.

* Findings of ACL 2022 preprint

Via

Access Paper or Ask Questions

Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

Jun 15, 2021

Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

Figure 1 for Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

Figure 2 for Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

Figure 3 for Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

Figure 4 for Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

Abstract:Speech enhancement promises higher efficiency in ad-hoc microphone arrays than in constrained microphone arrays thanks to the wide spatial coverage of the devices in the acoustic scene. However, speech enhancement in ad-hoc microphone arrays still raises many challenges. In particular, the algorithms should be able to handle a variable number of microphones, as some devices in the array might appear or disappear. In this paper, we propose a solution that can efficiently process the spatial information captured by the different devices of the microphone array, while being robust to a link failure. To do this, we use an attention mechanism in order to put more weight on the relevant signals sent throughout the array and to neglect the redundant or empty channels.

* European Signal Processing Conference (EUSIPCO), IEEE, Aug 2021, Dublin, Ireland

Via

Access Paper or Ask Questions

Improving Automatic Hate Speech Detection with Multiword Expression Features

Jun 01, 2021

Nicolas Zampieri, Irina Illina, Dominique Fohr

Figure 1 for Improving Automatic Hate Speech Detection with Multiword Expression Features

Figure 2 for Improving Automatic Hate Speech Detection with Multiword Expression Features

Figure 3 for Improving Automatic Hate Speech Detection with Multiword Expression Features

Figure 4 for Improving Automatic Hate Speech Detection with Multiword Expression Features

Abstract:The task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional meanings. We propose to integrate MWE features in a deep neural network-based HSD framework. Our baseline HSD system relies on Universal Sentence Encoder (USE). To incorporate MWE features, we create a three-branch deep neural network: one branch for USE, one for MWE categories, and one for MWE embeddings. We conduct experiments on two hate speech tweet corpora with different MWE categories and with two types of MWE embeddings, word2vec and BERT. Our experiments demonstrate that the proposed HSD system with MWE features significantly outperforms the baseline system in terms of macro-F1.

* In Proceedings of NLDB 2021

Via

Access Paper or Ask Questions