End To End Speech Recognition


End-to-end speech recognition is the process of transcribing speech directly into text without intermediate steps.

Eureka-Audio: Triggering Audio Intelligence in Compact Language Models

Add code
Feb 15, 2026
Viaarxiv icon

Voxtral Realtime

Add code
Feb 11, 2026
Viaarxiv icon

VocalNet-MDM: Accelerating Streaming Speech LLM via Self-Distilled Masked Diffusion Modeling

Add code
Feb 09, 2026
Viaarxiv icon

Equipping LLM with Directional Multi-Talker Speech Understanding Capabilities

Add code
Feb 06, 2026
Viaarxiv icon

MedSpeak: A Knowledge Graph-Aided ASR Error Correction Framework for Spoken Medical QA

Add code
Feb 01, 2026
Viaarxiv icon

EmoAra: Emotion-Preserving English Speech Transcription and Cross-Lingual Translation with Arabic Text-to-Speech

Add code
Feb 01, 2026
Viaarxiv icon

CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR

Add code
Jan 30, 2026
Viaarxiv icon

VIBEVOICE-ASR Technical Report

Add code
Jan 26, 2026
Viaarxiv icon

Distillation-based Layer Dropping (DLD): Effective End-to-end Framework for Dynamic Speech Networks

Add code
Jan 27, 2026
Viaarxiv icon

SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition

Add code
Jan 28, 2026
Viaarxiv icon