Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaodan Chen

ETIS, A*STAR, IPAL

Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling

Jun 13, 2025

Xiaodan Chen, Xiaoxue Gao, Mathias Quoy, Alexandre Pitti, Nancy F. Chen

Figure 1 for Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling

Figure 2 for Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling

Figure 3 for Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling

Figure 4 for Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling

Abstract:Voiced Electromyography (EMG)-to-Speech (V-ETS) models reconstruct speech from muscle activity signals, facilitating applications such as neurolaryngologic diagnostics. Despite its potential, the advancement of V-ETS is hindered by a scarcity of paired EMG-speech data. To address this, we propose a novel Confidence-based Multi-Speaker Self-training (CoM2S) approach, along with a newly curated Libri-EMG dataset. This approach leverages synthetic EMG data generated by a pre-trained model, followed by a proposed filtering mechanism based on phoneme-level confidence to enhance the ETS model through the proposed self-training techniques. Experiments demonstrate our method improves phoneme accuracy, reduces phonological confusion, and lowers word error rate, confirming the effectiveness of our CoM2S approach for V-ETS. In support of future research, we will release the codes and the proposed Libri-EMG dataset-an open-access, time-aligned, multi-speaker voiced EMG and speech recordings.

Via

Access Paper or Ask Questions

Developmental Predictive Coding Model for Early Infancy Mono and Bilingual Vocal Continual Learning

Dec 23, 2024

Xiaodan Chen, Alexandre Pitti, Mathias Quoy, Nancy F Chen

Abstract:Understanding how infants perceive speech sounds and language structures is still an open problem. Previous research in artificial neural networks has mainly focused on large dataset-dependent generative models, aiming to replicate language-related phenomena such as ''perceptual narrowing''. In this paper, we propose a novel approach using a small-sized generative neural network equipped with a continual learning mechanism based on predictive coding for mono-and bilingual speech sound learning (referred to as language sound acquisition during ''critical period'') and a compositional optimization mechanism for generation where no learning is involved (later infancy sound imitation). Our model prioritizes interpretability and demonstrates the advantages of online learning: Unlike deep networks requiring substantial offline training, our model continuously updates with new data, making it adaptable and responsive to changing inputs. Through experiments, we demonstrate that if second language acquisition occurs during later infancy, the challenges associated with learning a foreign language after the critical period amplify, replicating the perceptual narrowing effect.

* Artificial Neural Networks and Machine Learning -- ICANN 2024, Sep 2024, Lugano, Switzerland. pp.16 - 32

Via

Access Paper or Ask Questions