speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

TICL+: A Case Study On Speech In-Context Learning for Children's Speech Recognition

Add code
Dec 20, 2025
Viaarxiv icon

Phoneme-based speech recognition driven by large language models and sampling marginalization

Add code
Dec 20, 2025
Viaarxiv icon

Semantic Codebooks as Effective Priors for Neural Speech Compression

Add code
Dec 25, 2025
Figure 1 for Semantic Codebooks as Effective Priors for Neural Speech Compression
Figure 2 for Semantic Codebooks as Effective Priors for Neural Speech Compression
Figure 3 for Semantic Codebooks as Effective Priors for Neural Speech Compression
Figure 4 for Semantic Codebooks as Effective Priors for Neural Speech Compression
Viaarxiv icon

Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition

Add code
Dec 20, 2025
Figure 1 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 2 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 3 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 4 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Viaarxiv icon

From Speech to Subtitles: Evaluating ASR Models in Subtitling Italian Television Programs

Add code
Dec 22, 2025
Viaarxiv icon

Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains

Add code
Dec 22, 2025
Figure 1 for Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains
Figure 2 for Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains
Figure 3 for Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains
Figure 4 for Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains
Viaarxiv icon

Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models

Add code
Dec 19, 2025
Figure 1 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Figure 2 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Figure 3 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Figure 4 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Viaarxiv icon

Incorporating Error Level Noise Embedding for Improving LLM-Assisted Robustness in Persian Speech Recognition

Add code
Dec 19, 2025
Viaarxiv icon

Peeking Into The Future For Contextual Biasing

Add code
Dec 19, 2025
Figure 1 for Peeking Into The Future For Contextual Biasing
Figure 2 for Peeking Into The Future For Contextual Biasing
Figure 3 for Peeking Into The Future For Contextual Biasing
Figure 4 for Peeking Into The Future For Contextual Biasing
Viaarxiv icon

Scalable Frameworks for Real-World Audio-Visual Speech Recognition

Add code
Dec 16, 2025
Figure 1 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 2 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 3 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 4 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Viaarxiv icon