Picture for Xihua Wang

Xihua Wang

Unified Synthesis of Compositional Speech and Sound from Free-Form Text Prompts

Add code
May 27, 2026
Viaarxiv icon

SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning

Add code
May 12, 2026
Viaarxiv icon

Qwen-Image-2.0 Technical Report

Add code
May 11, 2026
Viaarxiv icon

Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization

Add code
Dec 26, 2024
Figure 1 for Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
Figure 2 for Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
Figure 3 for Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
Figure 4 for Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
Viaarxiv icon

Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer

Add code
Dec 21, 2024
Figure 1 for Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer
Figure 2 for Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer
Figure 3 for Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer
Figure 4 for Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer
Viaarxiv icon

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

Add code
Jan 31, 2024
Viaarxiv icon