speech


Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR

Add code
Apr 03, 2026
Viaarxiv icon

Split and Conquer Partial Deepfake Speech

Add code
Apr 03, 2026
Viaarxiv icon

SentiAvatar: Towards Expressive and Interactive Digital Humans

Add code
Apr 03, 2026
Viaarxiv icon

VisionClaw: Always-On AI Agents through Smart Glasses

Add code
Apr 03, 2026
Viaarxiv icon

GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

Add code
Apr 02, 2026
Viaarxiv icon

Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction

Add code
Apr 02, 2026
Viaarxiv icon

Human-Guided Reasoning with Large Language Models for Vietnamese Speech Emotion Recognition

Add code
Apr 02, 2026
Viaarxiv icon

Acoustic and perceptual differences between standard and accented Chinese speech and their voice clones

Add code
Apr 02, 2026
Viaarxiv icon

Validating Computational Markers of Depressive Behavior: Cross-Linguistic Speech-Based Depression Detection with Neurophysiological Validation

Add code
Apr 02, 2026
Viaarxiv icon

Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multi-channel Cross-Correlations

Add code
Apr 02, 2026
Viaarxiv icon