speech


RedVox: Safety and Fairness Gaps in Speech Models Across Languages

Add code
Jun 25, 2026
Viaarxiv icon

SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages

Add code
Jun 25, 2026
Viaarxiv icon

Heterogeneous Neural Predictivity from Language Models During Naturalistic Comprehension

Add code
Jun 25, 2026
Viaarxiv icon

FBK's Long-form SpeechLLMs for IWSLT 2026 Instruction Following

Add code
Jun 25, 2026
Viaarxiv icon

Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean

Add code
Jun 25, 2026
Viaarxiv icon

VoiceTTA: Enhancing Zero-Shot Text-to-Speech via Reinforcement Learning-Based Test-Time Adaptation

Add code
Jun 25, 2026
Viaarxiv icon

Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

Add code
Jun 25, 2026
Viaarxiv icon

wav2tok 2.0: Scalable Audio Tokenization Maintaining Explicit Pairwise Token Alignment for Efficient Audio Retrieval

Add code
Jun 25, 2026
Viaarxiv icon

DNSMOS-C: Improving End-to-end Speech Quality Models via Contrastive Learning

Add code
Jun 25, 2026
Viaarxiv icon

AnySimLite: A Lightweight Few-Shot Similarity Encoder for On-Device Speech-Adjacent Classification

Add code
Jun 24, 2026
Viaarxiv icon