speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages

Add code
Nov 12, 2025
Figure 1 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 2 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 3 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 4 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Viaarxiv icon

How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer

Add code
Nov 15, 2025
Figure 1 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 2 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 3 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 4 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Viaarxiv icon

WST: Weakly Supervised Transducer for Automatic Speech Recognition

Add code
Nov 06, 2025
Viaarxiv icon

Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification

Add code
Nov 12, 2025
Figure 1 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Figure 2 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Figure 3 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Figure 4 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Viaarxiv icon

End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering

Add code
Nov 12, 2025
Viaarxiv icon

Quantizing Whisper-small: How design choices affect ASR performance

Add code
Nov 11, 2025
Viaarxiv icon

Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction

Add code
Nov 11, 2025
Figure 1 for Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction
Figure 2 for Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction
Figure 3 for Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction
Figure 4 for Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction
Viaarxiv icon

Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition

Add code
Oct 27, 2025
Viaarxiv icon

Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment

Add code
Nov 09, 2025
Viaarxiv icon

Accelerating scientific discovery with the common task framework

Add code
Nov 06, 2025
Viaarxiv icon