Speaker


PoDAR: Power-Disentangled Audio Representation for Generative Modeling

Add code
May 11, 2026
Viaarxiv icon

Initiation of Interaction Detection Framework using a Nonverbal Cue for Human-Robot Interaction

Add code
May 11, 2026
Viaarxiv icon

Single-Microphone Audio Point Source Discriminative Localization From Reverberation Late Tail Estimation

Add code
May 10, 2026
Viaarxiv icon

Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

Add code
May 10, 2026
Viaarxiv icon

When2Speak: A Dataset for Temporal Participation and Turn-Taking in Multi-Party Conversations for Large Language Models

Add code
May 07, 2026
Viaarxiv icon

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

Add code
May 07, 2026
Viaarxiv icon

XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

Add code
May 07, 2026
Viaarxiv icon

PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers

Add code
May 07, 2026
Viaarxiv icon

Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset

Add code
May 06, 2026
Viaarxiv icon

SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 55+ Sign Languages

Add code
May 06, 2026
Viaarxiv icon