speech


Synergizing Zero-Shot Cross-Lingual Alzheimer Detection with Language-Invariant Multimodal Bi-Geometric Adversarial Learning

Add code
Jun 15, 2026
Viaarxiv icon

WaveSync: Constrained Wavefront Optimization for Synchronized Co-Speech Gestures in Humanoid Robots

Add code
Jun 15, 2026
Viaarxiv icon

Scaling Human and G2P Supervision for Robust Phonetic Transcription

Add code
Jun 14, 2026
Viaarxiv icon

Stringalign: Moving beyond summary statistics with a transparent Unicode-aware tool for evaluating automatic transcription models

Add code
Jun 14, 2026
Viaarxiv icon

Bridging the SEA Gap: An Initial Benchmark for Neural Audio Codec-Synthesized Speech Deepfakes in South-East Asian Languages

Add code
Jun 14, 2026
Viaarxiv icon

SiGnature: Explicit Motion Diffusion for Stylized Semantic Gesture

Add code
Jun 14, 2026
Viaarxiv icon

NVMOS: Non-Verbal Vocalization Quality Assessment in Speech

Add code
Jun 14, 2026
Viaarxiv icon

EmoZone-Talker: Regional Semantic Control of Audio-Driven 3DGS Talking Heads via Facial Action Units

Add code
Jun 14, 2026
Viaarxiv icon

MambAdapter: Lightweight Mamba-Based Adapters for Parameter-Efficient Transfer Learning in Speech and Audio

Add code
Jun 14, 2026
Viaarxiv icon

AP-GRPO: Anchor-Gated Phonetic Alignment with Policy Optimization for Pathological Speech Reconstruction

Add code
Jun 14, 2026
Viaarxiv icon