speech


Neuron-Level Emotion Control in Speech-Generative Large Audio-Language Models

Add code
Mar 18, 2026
Viaarxiv icon

AURORA Model of Formant-to-Tongue Inversion for Didactic and Clinical Applications

Add code
Mar 18, 2026
Viaarxiv icon

Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation

Add code
Mar 17, 2026
Viaarxiv icon

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Add code
Mar 17, 2026
Viaarxiv icon

LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement

Add code
Mar 17, 2026
Viaarxiv icon

Fanar 2.0: Arabic Generative AI Stack

Add code
Mar 17, 2026
Viaarxiv icon

HRTF-guided Binaural Target Speaker Extraction with Real-World Validation

Add code
Mar 17, 2026
Viaarxiv icon

VorTEX: Various overlap ratio for Target speech EXtraction

Add code
Mar 17, 2026
Viaarxiv icon

DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models

Add code
Mar 17, 2026
Viaarxiv icon

Speak, Segment, Track, Navigate: An Interactive System for Video-Guided Skull-Base Surgery

Add code
Mar 17, 2026
Viaarxiv icon