End To End Speech Recognition


End-to-end speech recognition is the process of transcribing speech directly into text without intermediate steps.

End-to-End DOA-Guided Speech Extraction in Noisy Multi-Talker Scenarios

Add code
Jul 28, 2025
Viaarxiv icon

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Viaarxiv icon

Improving Contextual ASR via Multi-grained Fusion with Large Language Models

Add code
Jul 16, 2025
Viaarxiv icon

Code-Switching in End-to-End Automatic Speech Recognition: A Systematic Literature Review

Add code
Jul 10, 2025
Viaarxiv icon

End-to-end Acoustic-linguistic Emotion and Intent Recognition Enhanced by Semi-supervised Learning

Add code
Jul 10, 2025
Viaarxiv icon

Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform

Add code
Jun 13, 2025
Viaarxiv icon

SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition

Add code
Jun 15, 2025
Viaarxiv icon

Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios

Add code
Jun 17, 2025
Viaarxiv icon

Regularizing Learnable Feature Extraction for Automatic Speech Recognition

Add code
Jun 11, 2025
Viaarxiv icon

Unifying Streaming and Non-streaming Zipformer-based ASR

Add code
Jun 17, 2025
Viaarxiv icon