speech


Learnable Pulse Accumulation for On-Device Speech Recognition: How Much Attention Do You Need?

Add code
Mar 11, 2026
Viaarxiv icon

QV May Be Enough: Toward the Essence of Attention in LLMs

Add code
Mar 11, 2026
Viaarxiv icon

Multi-View Based Audio Visual Target Speaker Extraction

Add code
Mar 11, 2026
Viaarxiv icon

SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation

Add code
Mar 11, 2026
Viaarxiv icon

Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies

Add code
Mar 11, 2026
Viaarxiv icon

Duration Aware Scheduling for ASR Serving Under Workload Drift

Add code
Mar 11, 2026
Viaarxiv icon

Synthetic Data Domain Adaptation for ASR via LLM-based Text and Phonetic Respelling Augmentation

Add code
Mar 11, 2026
Viaarxiv icon

When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS

Add code
Mar 11, 2026
Viaarxiv icon

Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation

Add code
Mar 11, 2026
Viaarxiv icon

MoXaRt: Audio-Visual Object-Guided Sound Interaction for XR

Add code
Mar 11, 2026
Viaarxiv icon