speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models

Add code
Oct 26, 2025
Viaarxiv icon

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

Add code
Oct 08, 2025
Viaarxiv icon

Structured Sparsity and Weight-adaptive Pruning for Memory and Compute efficient Whisper models

Add code
Oct 14, 2025
Viaarxiv icon

Machine Unlearning in Speech Emotion Recognition via Forget Set Alone

Add code
Oct 05, 2025
Figure 1 for Machine Unlearning in Speech Emotion Recognition via Forget Set Alone
Figure 2 for Machine Unlearning in Speech Emotion Recognition via Forget Set Alone
Figure 3 for Machine Unlearning in Speech Emotion Recognition via Forget Set Alone
Viaarxiv icon

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Add code
Oct 06, 2025
Figure 1 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Figure 2 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Figure 3 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Figure 4 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Viaarxiv icon

Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation

Add code
Oct 08, 2025
Figure 1 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Figure 2 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Figure 3 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Figure 4 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Viaarxiv icon

Bloodroot: When Watermarking Turns Poisonous For Stealthy Backdoor

Add code
Oct 09, 2025
Viaarxiv icon

Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks

Add code
Sep 26, 2025
Viaarxiv icon

How I Built ASR for Endangered Languages with a Spoken Dictionary

Add code
Oct 06, 2025
Viaarxiv icon

Evaluating Self-Supervised Speech Models via Text-Based LLMS

Add code
Oct 06, 2025
Viaarxiv icon