speech


Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation

Add code
Jun 11, 2026
Viaarxiv icon

PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue

Add code
Jun 11, 2026
Viaarxiv icon

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

Add code
Jun 11, 2026
Viaarxiv icon

AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages

Add code
Jun 10, 2026
Viaarxiv icon

BASENet: Band-Adapted Speech Enhancement Network with Cross-Band Attention

Add code
Jun 10, 2026
Viaarxiv icon

Fast-SDE: Efficient Single-Microphone Sound Source Distance Estimation in Reverberant Environments

Add code
Jun 10, 2026
Viaarxiv icon

Characterization of Speech Imagery in Scalp EEG and Comparison with Motor Imagery

Add code
Jun 10, 2026
Viaarxiv icon

Tight Boundary Prediction in Speaker Diarization Using Causal-Anticausal Consistency

Add code
Jun 10, 2026
Viaarxiv icon

Fast Speech Foundation Model Distillation Using Interleaved Stacking

Add code
Jun 10, 2026
Viaarxiv icon

M*: A Modular, Extensible, Serving System for Multimodal Models

Add code
Jun 10, 2026
Viaarxiv icon