speech


LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement

Add code
Mar 17, 2026
Viaarxiv icon

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Add code
Mar 17, 2026
Viaarxiv icon

Fanar 2.0: Arabic Generative AI Stack

Add code
Mar 17, 2026
Viaarxiv icon

HRTF-guided Binaural Target Speaker Extraction with Real-World Validation

Add code
Mar 17, 2026
Viaarxiv icon

VorTEX: Various overlap ratio for Target speech EXtraction

Add code
Mar 17, 2026
Viaarxiv icon

Speak, Segment, Track, Navigate: An Interactive System for Video-Guided Skull-Base Surgery

Add code
Mar 17, 2026
Viaarxiv icon

CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS

Add code
Mar 17, 2026
Viaarxiv icon

Attention-guided Evidence Grounding for Spoken Question Answering

Add code
Mar 17, 2026
Viaarxiv icon

Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus

Add code
Mar 17, 2026
Viaarxiv icon

SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Add code
Mar 17, 2026
Viaarxiv icon