speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages

Add code
Nov 12, 2025
Figure 1 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 2 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 3 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 4 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Viaarxiv icon

TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English

Add code
Nov 13, 2025
Viaarxiv icon

AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR

Add code
Nov 18, 2025
Viaarxiv icon

Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation

Add code
Nov 18, 2025
Viaarxiv icon

How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer

Add code
Nov 15, 2025
Figure 1 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 2 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 3 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 4 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Viaarxiv icon

Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification

Add code
Nov 12, 2025
Figure 1 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Figure 2 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Figure 3 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Figure 4 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Viaarxiv icon

End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering

Add code
Nov 12, 2025
Viaarxiv icon

Quantizing Whisper-small: How design choices affect ASR performance

Add code
Nov 11, 2025
Viaarxiv icon

Ground Truth Generation for Multilingual Historical NLP using LLMs

Add code
Nov 18, 2025
Figure 1 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Figure 2 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Figure 3 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Figure 4 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Viaarxiv icon

Scriboora: Rethinking Human Pose Forecasting

Add code
Nov 19, 2025
Figure 1 for Scriboora: Rethinking Human Pose Forecasting
Figure 2 for Scriboora: Rethinking Human Pose Forecasting
Figure 3 for Scriboora: Rethinking Human Pose Forecasting
Figure 4 for Scriboora: Rethinking Human Pose Forecasting
Viaarxiv icon