speech


Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation

Add code
May 21, 2026
Viaarxiv icon

Effective User-defined Keyword Spotting with Dual-stage Matching, Multi-modal Enrollment, and Continual Adaptation

Add code
May 21, 2026
Viaarxiv icon

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

Add code
May 21, 2026
Viaarxiv icon

Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation

Add code
May 20, 2026
Viaarxiv icon

Raon-OpenTTS: Open Models and Data for Robust Text-to-Speech

Add code
May 20, 2026
Viaarxiv icon

A Survey of Audio Reasoning in Multimodal Foundation Models

Add code
May 20, 2026
Viaarxiv icon

Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding

Add code
May 20, 2026
Viaarxiv icon

LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control

Add code
May 20, 2026
Viaarxiv icon

MM-Conv: A Multimodal Dataset and Benchmark for Context-Aware Grounding in 3D Dialogue

Add code
May 20, 2026
Viaarxiv icon

Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media

Add code
May 20, 2026
Viaarxiv icon