speech


Multi-channel multi-speaker transformer for speech recognition

Add code
Jan 06, 2026
Viaarxiv icon

Dynamic Quantization Error Propagation in Encoder-Decoder ASR Quantization

Add code
Jan 05, 2026
Viaarxiv icon

On the Role of Spatial Features in Foundation-Model-Based Speaker Diarization

Add code
Jan 05, 2026
Viaarxiv icon

ARCADE: A City-Scale Corpus for Fine-Grained Arabic Dialect Tagging

Add code
Jan 05, 2026
Viaarxiv icon

Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation

Add code
Jan 05, 2026
Viaarxiv icon

VocalBridge: Latent Diffusion-Bridge Purification for Defeating Perturbation-Based Voiceprint Defenses

Add code
Jan 05, 2026
Viaarxiv icon

Towards Prosodically Informed Mizo TTS without Explicit Tone Markings

Add code
Jan 05, 2026
Viaarxiv icon

What you reward is what you learn: Comparing rewards for online speech policy optimization in public HRI

Add code
Jan 05, 2026
Viaarxiv icon

MORE: Multi-Objective Adversarial Attacks on Speech Recognition

Add code
Jan 05, 2026
Viaarxiv icon

Quantifying Quanvolutional Neural Networks Robustness for Speech in Healthcare Applications

Add code
Jan 05, 2026
Viaarxiv icon