Picture for Jonas Beskow

Jonas Beskow

Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views

Add code
Oct 26, 2025
Viaarxiv icon

Gesture Evaluation in Virtual Reality

Add code
Sep 16, 2025
Viaarxiv icon

Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation

Add code
Sep 16, 2025
Viaarxiv icon

Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents

Add code
Sep 15, 2025
Viaarxiv icon

Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents

Add code
Aug 07, 2024
Figure 1 for Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents
Figure 2 for Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents
Figure 3 for Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents
Figure 4 for Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents
Viaarxiv icon

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech

Add code
Jun 08, 2024
Viaarxiv icon

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

Add code
Apr 30, 2024
Viaarxiv icon

Unified speech and gesture synthesis using flow matching

Add code
Oct 08, 2023
Figure 1 for Unified speech and gesture synthesis using flow matching
Figure 2 for Unified speech and gesture synthesis using flow matching
Figure 3 for Unified speech and gesture synthesis using flow matching
Viaarxiv icon

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

Add code
Sep 11, 2023
Figure 1 for Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
Figure 2 for Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
Figure 3 for Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
Figure 4 for Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
Viaarxiv icon

Matcha-TTS: A fast TTS architecture with conditional flow matching

Add code
Sep 06, 2023
Figure 1 for Matcha-TTS: A fast TTS architecture with conditional flow matching
Figure 2 for Matcha-TTS: A fast TTS architecture with conditional flow matching
Figure 3 for Matcha-TTS: A fast TTS architecture with conditional flow matching
Figure 4 for Matcha-TTS: A fast TTS architecture with conditional flow matching
Viaarxiv icon