speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models

Add code
Jun 16, 2025
Viaarxiv icon

Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR

Add code
Jun 16, 2025
Viaarxiv icon

BUT System for the MLC-SLM Challenge

Add code
Jun 16, 2025
Viaarxiv icon

Unifying Streaming and Non-streaming Zipformer-based ASR

Add code
Jun 17, 2025
Figure 1 for Unifying Streaming and Non-streaming Zipformer-based ASR
Figure 2 for Unifying Streaming and Non-streaming Zipformer-based ASR
Figure 3 for Unifying Streaming and Non-streaming Zipformer-based ASR
Figure 4 for Unifying Streaming and Non-streaming Zipformer-based ASR
Viaarxiv icon

Joint ASR and Speaker Role Tagging with Serialized Output Training

Add code
Jun 12, 2025
Viaarxiv icon

Improving Named Entity Transcription with Contextual LLM-based Revision

Add code
Jun 12, 2025
Viaarxiv icon

SecureSpeech: Prompt-based Speaker and Content Protection

Add code
Jul 10, 2025
Viaarxiv icon

Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper

Add code
Jun 18, 2025
Viaarxiv icon

Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation

Add code
Jun 09, 2025
Viaarxiv icon

Advances in Small-Footprint Keyword Spotting: A Comprehensive Review of Efficient Models and Algorithms

Add code
Jun 12, 2025
Viaarxiv icon