Picture for Gabriel Skantze

Gabriel Skantze

MuVAP: Multimodal Multiparty Voice Activity Projection for Turn-taking Prediction in the Wild

Add code
Jun 15, 2026
Viaarxiv icon

Face versus Body Tracking for Human-Robot Interaction: An Egocentric Dataset

Add code
Jun 02, 2026
Viaarxiv icon

Do Factual Recall Mechanisms Carry over from Text to Speech in Multimodal Language Models?

Add code
May 21, 2026
Viaarxiv icon

VoXtream2: Full-stream TTS with dynamic speaking rate control

Add code
Mar 13, 2026
Viaarxiv icon

VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency

Add code
Sep 19, 2025
Viaarxiv icon

"Dyadosyncrasy", Idiosyncrasy and Demographic Factors in Turn-Taking

Add code
May 30, 2025
Viaarxiv icon

Representation of perceived prosodic similarity of conversational feedback

Add code
May 19, 2025
Viaarxiv icon

What Can You Say to a Robot? Capability Communication Leads to More Natural Conversations

Add code
Feb 03, 2025
Figure 1 for What Can You Say to a Robot? Capability Communication Leads to More Natural Conversations
Figure 2 for What Can You Say to a Robot? Capability Communication Leads to More Natural Conversations
Figure 3 for What Can You Say to a Robot? Capability Communication Leads to More Natural Conversations
Figure 4 for What Can You Say to a Robot? Capability Communication Leads to More Natural Conversations
Viaarxiv icon

Applying General Turn-taking Models to Conversational Human-Robot Interaction

Add code
Jan 15, 2025
Figure 1 for Applying General Turn-taking Models to Conversational Human-Robot Interaction
Figure 2 for Applying General Turn-taking Models to Conversational Human-Robot Interaction
Figure 3 for Applying General Turn-taking Models to Conversational Human-Robot Interaction
Figure 4 for Applying General Turn-taking Models to Conversational Human-Robot Interaction
Viaarxiv icon

Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection

Add code
Oct 21, 2024
Figure 1 for Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection
Figure 2 for Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection
Figure 3 for Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection
Figure 4 for Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection
Viaarxiv icon