speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition

Add code
Dec 20, 2025
Figure 1 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 2 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 3 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 4 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Viaarxiv icon

Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models

Add code
Dec 19, 2025
Figure 1 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Figure 2 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Figure 3 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Figure 4 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Viaarxiv icon

Incorporating Error Level Noise Embedding for Improving LLM-Assisted Robustness in Persian Speech Recognition

Add code
Dec 19, 2025
Viaarxiv icon

From Speech to Subtitles: Evaluating ASR Models in Subtitling Italian Television Programs

Add code
Dec 22, 2025
Viaarxiv icon

Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains

Add code
Dec 22, 2025
Figure 1 for Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains
Figure 2 for Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains
Figure 3 for Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains
Figure 4 for Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains
Viaarxiv icon

Semantic Codebooks as Effective Priors for Neural Speech Compression

Add code
Dec 25, 2025
Figure 1 for Semantic Codebooks as Effective Priors for Neural Speech Compression
Figure 2 for Semantic Codebooks as Effective Priors for Neural Speech Compression
Figure 3 for Semantic Codebooks as Effective Priors for Neural Speech Compression
Figure 4 for Semantic Codebooks as Effective Priors for Neural Speech Compression
Viaarxiv icon

Scalable Frameworks for Real-World Audio-Visual Speech Recognition

Add code
Dec 16, 2025
Figure 1 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 2 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 3 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 4 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Viaarxiv icon

Peeking Into The Future For Contextual Biasing

Add code
Dec 19, 2025
Figure 1 for Peeking Into The Future For Contextual Biasing
Figure 2 for Peeking Into The Future For Contextual Biasing
Figure 3 for Peeking Into The Future For Contextual Biasing
Figure 4 for Peeking Into The Future For Contextual Biasing
Viaarxiv icon

Reproducing and Dissecting Denoising Language Models for Speech Recognition

Add code
Dec 15, 2025
Viaarxiv icon

When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems

Add code
Dec 19, 2025
Viaarxiv icon