speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction

Add code
Oct 21, 2025
Viaarxiv icon

Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking

Add code
Oct 10, 2025
Figure 1 for Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
Figure 2 for Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
Figure 3 for Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
Figure 4 for Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
Viaarxiv icon

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

Add code
Oct 09, 2025
Viaarxiv icon

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

Add code
Oct 08, 2025
Viaarxiv icon

Structured Sparsity and Weight-adaptive Pruning for Memory and Compute efficient Whisper models

Add code
Oct 14, 2025
Viaarxiv icon

Machine Unlearning in Speech Emotion Recognition via Forget Set Alone

Add code
Oct 05, 2025
Figure 1 for Machine Unlearning in Speech Emotion Recognition via Forget Set Alone
Figure 2 for Machine Unlearning in Speech Emotion Recognition via Forget Set Alone
Figure 3 for Machine Unlearning in Speech Emotion Recognition via Forget Set Alone
Viaarxiv icon

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Add code
Oct 06, 2025
Figure 1 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Figure 2 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Figure 3 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Figure 4 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Viaarxiv icon

EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models

Add code
Oct 26, 2025
Viaarxiv icon

MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making

Add code
Nov 10, 2025
Viaarxiv icon

Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation

Add code
Oct 08, 2025
Figure 1 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Figure 2 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Figure 3 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Figure 4 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Viaarxiv icon