Picture for Junshi Huang

Junshi Huang

JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing

Add code
Jan 03, 2025
Figure 1 for JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing
Figure 2 for JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing
Figure 3 for JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing
Figure 4 for JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing
Viaarxiv icon

FLUX that Plays Music

Add code
Sep 01, 2024
Figure 1 for FLUX that Plays Music
Figure 2 for FLUX that Plays Music
Figure 3 for FLUX that Plays Music
Figure 4 for FLUX that Plays Music
Viaarxiv icon

Scaling Diffusion Transformers to 16 Billion Parameters

Add code
Jul 16, 2024
Viaarxiv icon

Dimba: Transformer-Mamba Diffusion Models

Add code
Jun 03, 2024
Figure 1 for Dimba: Transformer-Mamba Diffusion Models
Figure 2 for Dimba: Transformer-Mamba Diffusion Models
Figure 3 for Dimba: Transformer-Mamba Diffusion Models
Figure 4 for Dimba: Transformer-Mamba Diffusion Models
Viaarxiv icon

Music Consistency Models

Add code
Apr 20, 2024
Viaarxiv icon

Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

Add code
Apr 06, 2024
Figure 1 for Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Figure 2 for Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Figure 3 for Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Figure 4 for Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Viaarxiv icon

Scalable Diffusion Models with State Space Backbone

Add code
Feb 25, 2024
Viaarxiv icon

Tuning-Free Inversion-Enhanced Control for Consistent Image Editing

Add code
Dec 22, 2023
Viaarxiv icon

A-JEPA: Joint-Embedding Predictive Architecture Can Listen

Add code
Nov 28, 2023
Figure 1 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Figure 2 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Figure 3 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Figure 4 for A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Viaarxiv icon

Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding

Add code
Nov 02, 2023
Viaarxiv icon