Picture for Yuxin Guo

Yuxin Guo

MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning

Add code
Jan 08, 2026
Viaarxiv icon

Klear: Unified Multi-Task Audio-Video Joint Generation

Add code
Jan 07, 2026
Viaarxiv icon

CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving

Add code
Oct 09, 2025
Viaarxiv icon

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Add code
Aug 27, 2025
Figure 1 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Figure 2 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Figure 3 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Figure 4 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Viaarxiv icon

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

Add code
May 26, 2025
Viaarxiv icon

Parallel Layer Normalization for Universal Approximation

Add code
May 19, 2025
Viaarxiv icon

Aligned Better, Listen Better for Audio-Visual Large Language Models

Add code
Apr 02, 2025
Figure 1 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 2 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 3 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 4 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Viaarxiv icon

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

Add code
Mar 25, 2025
Viaarxiv icon

Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion

Add code
Feb 20, 2025
Viaarxiv icon

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction

Add code
Feb 17, 2025
Viaarxiv icon