Picture for Hanwang Zhang

Hanwang Zhang

Visual Prompt Selection for In-Context Learning Segmentation

Add code
Jul 14, 2024
Viaarxiv icon

ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models

Add code
Jun 16, 2024
Figure 1 for ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Figure 2 for ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Figure 3 for ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Figure 4 for ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Viaarxiv icon

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Add code
Jun 13, 2024
Figure 1 for EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Figure 2 for EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Figure 3 for EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Figure 4 for EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Viaarxiv icon

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

Add code
Jun 10, 2024
Viaarxiv icon

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Add code
Jun 07, 2024
Viaarxiv icon

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Add code
May 27, 2024
Figure 1 for A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Figure 2 for A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Figure 3 for A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Figure 4 for A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Viaarxiv icon

Non-confusing Generation of Customized Concepts in Diffusion Models

Add code
May 11, 2024
Figure 1 for Non-confusing Generation of Customized Concepts in Diffusion Models
Figure 2 for Non-confusing Generation of Customized Concepts in Diffusion Models
Figure 3 for Non-confusing Generation of Customized Concepts in Diffusion Models
Figure 4 for Non-confusing Generation of Customized Concepts in Diffusion Models
Viaarxiv icon

Auto-Encoding Morph-Tokens for Multimodal LLM

Add code
May 03, 2024
Viaarxiv icon

Dual-Modal Prompting for Sketch-Based Image Retrieval

Add code
Apr 29, 2024
Figure 1 for Dual-Modal Prompting for Sketch-Based Image Retrieval
Figure 2 for Dual-Modal Prompting for Sketch-Based Image Retrieval
Figure 3 for Dual-Modal Prompting for Sketch-Based Image Retrieval
Figure 4 for Dual-Modal Prompting for Sketch-Based Image Retrieval
Viaarxiv icon

Diffusion Time-step Curriculum for One Image to 3D Generation

Add code
Apr 11, 2024
Figure 1 for Diffusion Time-step Curriculum for One Image to 3D Generation
Figure 2 for Diffusion Time-step Curriculum for One Image to 3D Generation
Figure 3 for Diffusion Time-step Curriculum for One Image to 3D Generation
Figure 4 for Diffusion Time-step Curriculum for One Image to 3D Generation
Viaarxiv icon