speech


2nd of the 5th PVUW MeViS-Audio Track: ASR-SaSaSa2VA

Add code
Apr 27, 2026
Viaarxiv icon

All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation

Add code
Apr 27, 2026
Viaarxiv icon

Scaling Properties of Continuous Diffusion Spoken Language Models

Add code
Apr 27, 2026
Viaarxiv icon

A Comparative Evaluation of AI Agent Security Guardrails

Add code
Apr 27, 2026
Viaarxiv icon

Speech Enhancement Based on Drifting Models

Add code
Apr 27, 2026
Viaarxiv icon

AMAVA: Adaptive Motion-Aware Video-to-Audio Framework for Visually-Impaired Assistance

Add code
Apr 26, 2026
Viaarxiv icon

Talking Slide Avatars: Open-Source Multimodal Communication Approach for Teaching

Add code
Apr 26, 2026
Viaarxiv icon

Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation

Add code
Apr 26, 2026
Viaarxiv icon

Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations

Add code
Apr 25, 2026
Viaarxiv icon

Au-M-ol: A Unified Model for Medical Audio and Language Understanding

Add code
Apr 25, 2026
Viaarxiv icon