Picture for Siyang Wang

Siyang Wang

From Sequential to Spatial: Reordering Autoregression for Efficient Visual Generation

Add code
Dec 31, 2025
Viaarxiv icon

Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation

Add code
Sep 16, 2025
Viaarxiv icon

Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents

Add code
Sep 15, 2025
Figure 1 for Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents
Figure 2 for Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents
Figure 3 for Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents
Figure 4 for Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents
Viaarxiv icon

Varformer: Adapting VAR's Generative Prior for Image Restoration

Add code
Dec 30, 2024
Figure 1 for Varformer: Adapting VAR's Generative Prior for Image Restoration
Figure 2 for Varformer: Adapting VAR's Generative Prior for Image Restoration
Figure 3 for Varformer: Adapting VAR's Generative Prior for Image Restoration
Figure 4 for Varformer: Adapting VAR's Generative Prior for Image Restoration
Viaarxiv icon

Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation

Add code
Jul 19, 2024
Figure 1 for Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation
Figure 2 for Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation
Figure 3 for Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation
Figure 4 for Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation
Viaarxiv icon

Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model

Add code
May 16, 2024
Figure 1 for Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Figure 2 for Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Figure 3 for Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Figure 4 for Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Viaarxiv icon

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Add code
Jul 11, 2023
Viaarxiv icon

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

Add code
Jun 15, 2023
Figure 1 for Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Figure 2 for Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Figure 3 for Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Viaarxiv icon

Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis

Add code
May 29, 2023
Figure 1 for Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Figure 2 for Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Figure 3 for Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Figure 4 for Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Viaarxiv icon

A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS

Add code
Mar 05, 2023
Viaarxiv icon