Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingchi Hou

Data Augmentation for Pathological Speech Enhancement

Feb 16, 2026

Mingchi Hou, Enno Hermann, Ina Kodrasi

Abstract:The performance of state-of-the-art speech enhancement (SE) models considerably degrades for pathological speech due to atypical acoustic characteristics and limited data availability. This paper systematically investigates data augmentation (DA) strategies to improve SE performance for pathological speakers, evaluating both predictive and generative SE models. We examine three DA categories, i.e., transformative, generative, and noise augmentation, assessing their impact with objective SE metrics. Experimental results show that noise augmentation consistently delivers the largest and most robust gains, transformative augmentations provide moderate improvements, while generative augmentation yields limited benefits and can harm performance as the amount of synthetic data increases. Furthermore, we show that the effectiveness of DA varies depending on the SE model, with DA being more beneficial for predictive SE models. While our results demonstrate that DA improves SE performance for pathological speakers, a performance gap between neurotypical and pathological speech persists, highlighting the need for future research on targeted DA strategies for pathological speech.

Via

Access Paper or Ask Questions

Variational Autoencoder for Personalized Pathological Speech Enhancement

Mar 18, 2025

Mingchi Hou, Ina Kodrasi

Figure 1 for Variational Autoencoder for Personalized Pathological Speech Enhancement

Figure 2 for Variational Autoencoder for Personalized Pathological Speech Enhancement

Figure 3 for Variational Autoencoder for Personalized Pathological Speech Enhancement

Figure 4 for Variational Autoencoder for Personalized Pathological Speech Enhancement

Abstract:The generalizability of speech enhancement (SE) models across speaker conditions remains largely unexplored, despite its critical importance for broader applicability. This paper investigates the performance of the hybrid variational autoencoder (VAE)-non-negative matrix factorization (NMF) model for SE, focusing primarily on its generalizability to pathological speakers with Parkinson's disease. We show that VAE models trained on large neurotypical datasets perform poorly on pathological speech. While fine-tuning these pre-trained models with pathological speech improves performance, a performance gap remains between neurotypical and pathological speakers. To address this gap, we propose using personalized SE models derived from fine-tuning pre-trained models with only a few seconds of clean data from each speaker. Our results demonstrate that personalized models considerably enhance performance for all speakers, achieving comparable results for both neurotypical and pathological speakers.

* Submitted to EUSIPCO 2025

Via

Access Paper or Ask Questions