speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition

Add code
Dec 20, 2025
Figure 1 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 2 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 3 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 4 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Viaarxiv icon

Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models

Add code
Dec 19, 2025
Figure 1 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Figure 2 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Figure 3 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Figure 4 for Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Viaarxiv icon

Incorporating Error Level Noise Embedding for Improving LLM-Assisted Robustness in Persian Speech Recognition

Add code
Dec 19, 2025
Viaarxiv icon

Peeking Into The Future For Contextual Biasing

Add code
Dec 19, 2025
Figure 1 for Peeking Into The Future For Contextual Biasing
Figure 2 for Peeking Into The Future For Contextual Biasing
Figure 3 for Peeking Into The Future For Contextual Biasing
Figure 4 for Peeking Into The Future For Contextual Biasing
Viaarxiv icon

When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems

Add code
Dec 19, 2025
Viaarxiv icon

Scalable Frameworks for Real-World Audio-Visual Speech Recognition

Add code
Dec 16, 2025
Figure 1 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 2 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 3 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 4 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Viaarxiv icon

Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models

Add code
Dec 18, 2025
Figure 1 for Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models
Figure 2 for Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models
Figure 3 for Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models
Figure 4 for Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models
Viaarxiv icon

Reproducing and Dissecting Denoising Language Models for Speech Recognition

Add code
Dec 15, 2025
Viaarxiv icon

A stylometric analysis of speaker attribution from speech transcripts

Add code
Dec 18, 2025
Viaarxiv icon

GeoSense-AI: Fast Location Inference from Crisis Microblogs

Add code
Dec 20, 2025
Figure 1 for GeoSense-AI: Fast Location Inference from Crisis Microblogs
Figure 2 for GeoSense-AI: Fast Location Inference from Crisis Microblogs
Figure 3 for GeoSense-AI: Fast Location Inference from Crisis Microblogs
Figure 4 for GeoSense-AI: Fast Location Inference from Crisis Microblogs
Viaarxiv icon