speech


DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models

Add code
Mar 17, 2026
Viaarxiv icon

Collecting Prosody in the Wild: A Content-Controlled, Privacy-First Smartphone Protocol and Empirical Evaluation

Add code
Mar 17, 2026
Viaarxiv icon

LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement

Add code
Mar 17, 2026
Viaarxiv icon

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Add code
Mar 17, 2026
Viaarxiv icon

Fanar 2.0: Arabic Generative AI Stack

Add code
Mar 17, 2026
Viaarxiv icon

HRTF-guided Binaural Target Speaker Extraction with Real-World Validation

Add code
Mar 17, 2026
Viaarxiv icon

VorTEX: Various overlap ratio for Target speech EXtraction

Add code
Mar 17, 2026
Viaarxiv icon

Speak, Segment, Track, Navigate: An Interactive System for Video-Guided Skull-Base Surgery

Add code
Mar 17, 2026
Viaarxiv icon

CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS

Add code
Mar 17, 2026
Viaarxiv icon

Attention-guided Evidence Grounding for Spoken Question Answering

Add code
Mar 17, 2026
Viaarxiv icon