Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marc Ouellet

End-to-End Intracortical Speech Decoding from Neural Activity

May 23, 2026

Owais Mujtaba Khanday, Jose A. Gonzalez-Lopez, Marc Ouellet, Alberto Galdon, Gonzalo Olivares Granados

Abstract:Current high-performing intracortical speech neuroprostheses achieve low word error rates but typically rely on external language models during inference, increasing memory, computation, and latency. In this work, we investigate whether meaningful character-level decoding is achievable without such models. We propose an end-to-end Conformer-based neural decoder trained directly on intracortical recordings from a participant with amyotrophic lateral sclerosis (ALS). Without any external language model, the system achieves a character error rate (CER) of 23.80\% on held-out validation data. Analysis shows that performance variability is driven by inter-session signal degradation, while dominant errors arise from incorrect word boundary segmentation. These results demonstrate that effective character-level decoding is possible in a fully end-to-end framework, providing a strong neural signal for downstream linguistic processing.

* Accepted at Odyssey 2026 (Lisbon)

Via

Access Paper or Ask Questions

Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings

May 21, 2025

Owais Mujtaba Khanday, Pablo Rodroguez San Esteban, Zubair Ahmad Lone, Marc Ouellet, Jose Andres Gonzalez Lopez

Abstract:Understanding how neural activity encodes speech and language production is a fundamental challenge in neuroscience and artificial intelligence. This study investigates whether embeddings from large-scale, self-supervised language and speech models can effectively reconstruct high-gamma neural activity characteristics, key indicators of cortical processing, recorded during speech production. We leverage pre-trained embeddings from deep learning models trained on linguistic and acoustic data to represent high-level speech features and map them onto these high-gamma signals. We analyze the extent to which these embeddings preserve the spatio-temporal dynamics of brain activity. Reconstructed neural signals are evaluated against high-gamma ground-truth activity using correlation metrics and signal reconstruction quality assessments. The results indicate that high-gamma activity can be effectively reconstructed using large language and speech model embeddings in all study participants, generating Pearson's correlation coefficients ranging from 0.79 to 0.99.

* Accepted for presentation at Interspeech2025

Via

Access Paper or Ask Questions