speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English

Add code
Nov 13, 2025
Viaarxiv icon

MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages

Add code
Nov 12, 2025
Figure 1 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 2 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 3 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 4 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Viaarxiv icon

How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer

Add code
Nov 15, 2025
Figure 1 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 2 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 3 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 4 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Viaarxiv icon

Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification

Add code
Nov 12, 2025
Figure 1 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Figure 2 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Figure 3 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Figure 4 for Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification
Viaarxiv icon

End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering

Add code
Nov 12, 2025
Viaarxiv icon

Quantizing Whisper-small: How design choices affect ASR performance

Add code
Nov 11, 2025
Viaarxiv icon

Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction

Add code
Nov 11, 2025
Figure 1 for Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction
Figure 2 for Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction
Figure 3 for Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction
Figure 4 for Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction
Viaarxiv icon

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

Add code
Nov 10, 2025
Figure 1 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 2 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 3 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 4 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Viaarxiv icon

Ground Truth Generation for Multilingual Historical NLP using LLMs

Add code
Nov 18, 2025
Figure 1 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Figure 2 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Figure 3 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Figure 4 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Viaarxiv icon

Scriboora: Rethinking Human Pose Forecasting

Add code
Nov 19, 2025
Figure 1 for Scriboora: Rethinking Human Pose Forecasting
Figure 2 for Scriboora: Rethinking Human Pose Forecasting
Figure 3 for Scriboora: Rethinking Human Pose Forecasting
Figure 4 for Scriboora: Rethinking Human Pose Forecasting
Viaarxiv icon