speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

LLM can Read Spectrogram: Encoder-free Speech-Language Modeling

Add code
Jun 08, 2026
Viaarxiv icon

Is Text All You Need? Text as a Universal Information Bottleneck for Speech LLMs

Add code
Jun 08, 2026
Viaarxiv icon

TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum

Add code
Jun 11, 2026
Viaarxiv icon

Contrastive Training with LLM-generated Near-Misses for Robust Code-Switching Speech Recognition

Add code
Jun 05, 2026
Viaarxiv icon

End-to-End Training for Discrete Token LLM based TTS System

Add code
Jun 08, 2026
Viaarxiv icon

FiLM-Based Speaker Conditioning of a SpeechLLM for Pathological Speech Recognition

Add code
Jun 04, 2026
Viaarxiv icon

Real-time body pose non-verbal communication with a consistency-based reliability measure

Add code
Jun 08, 2026
Viaarxiv icon

Multi-task Learning is Not Enough: Representational Entanglement in Dual-output Second Language Speech Recognition

Add code
Jun 04, 2026
Viaarxiv icon

NüshuVoice: Reviving the Voice of Endangered Nüshu with Pitch-Aware Text-to-Speech

Add code
Jun 08, 2026
Viaarxiv icon

M2S-AVSR: Modality-aware Multi-view Self-supervised Representation for Robust Audio-Visual Speech Recognition

Add code
Jun 04, 2026
Viaarxiv icon