speech


The CMU-AIST submission for the ICME 2025 Audio Encoder Challenge

Add code
Jan 22, 2026
Viaarxiv icon

Qwen3-TTS Technical Report

Add code
Jan 22, 2026
Viaarxiv icon

Sink or SWIM: Tackling Real-Time ASR at Scale

Add code
Jan 22, 2026
Viaarxiv icon

TidyVoice: A Curated Multilingual Dataset for Speaker Verification Derived from Common Voice

Add code
Jan 22, 2026
Viaarxiv icon

DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice

Add code
Jan 22, 2026
Viaarxiv icon

Adaptive Rotary Steering with Joint Autoregression for Robust Extraction of Closely Moving Speakers in Dynamic Scenarios

Add code
Jan 21, 2026
Viaarxiv icon

Performance and Complexity Trade-off Optimization of Speech Models During Training

Add code
Jan 21, 2026
Viaarxiv icon

Contrastive Knowledge Distillation for Embedding Refinement in Personalized Speech Enhancement

Add code
Jan 21, 2026
Viaarxiv icon

VCNAC: A Variable-Channel Neural Audio Codec for Mono, Stereo, and Surround Sound

Add code
Jan 21, 2026
Viaarxiv icon

Deaf and Hard of Hearing Access to Intelligent Personal Assistants: Comparison of Voice-Based Options with an LLM-Powered Touch Interface

Add code
Jan 21, 2026
Viaarxiv icon