Picture for Hanwang Zhang

Hanwang Zhang

Few-shot NeRF by Adaptive Rendering Loss Regularization

Add code
Oct 23, 2024
Figure 1 for Few-shot NeRF by Adaptive Rendering Loss Regularization
Figure 2 for Few-shot NeRF by Adaptive Rendering Loss Regularization
Figure 3 for Few-shot NeRF by Adaptive Rendering Loss Regularization
Figure 4 for Few-shot NeRF by Adaptive Rendering Loss Regularization
Viaarxiv icon

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

Add code
Sep 30, 2024
Figure 1 for Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
Figure 2 for Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
Figure 3 for Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
Figure 4 for Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
Viaarxiv icon

Instruction Tuning-free Visual Token Complement for Multimodal LLMs

Add code
Aug 09, 2024
Figure 1 for Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Figure 2 for Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Figure 3 for Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Viaarxiv icon

Selective Vision-Language Subspace Projection for Few-shot CLIP

Add code
Jul 26, 2024
Figure 1 for Selective Vision-Language Subspace Projection for Few-shot CLIP
Figure 2 for Selective Vision-Language Subspace Projection for Few-shot CLIP
Figure 3 for Selective Vision-Language Subspace Projection for Few-shot CLIP
Figure 4 for Selective Vision-Language Subspace Projection for Few-shot CLIP
Viaarxiv icon

Visual Prompt Selection for In-Context Learning Segmentation

Add code
Jul 14, 2024
Viaarxiv icon

ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models

Add code
Jun 16, 2024
Figure 1 for ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Figure 2 for ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Figure 3 for ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Figure 4 for ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Viaarxiv icon

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Add code
Jun 13, 2024
Viaarxiv icon

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

Add code
Jun 10, 2024
Figure 1 for MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
Figure 2 for MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
Figure 3 for MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
Figure 4 for MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
Viaarxiv icon

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Add code
Jun 07, 2024
Figure 1 for Towards Semantic Equivalence of Tokenization in Multimodal LLM
Figure 2 for Towards Semantic Equivalence of Tokenization in Multimodal LLM
Figure 3 for Towards Semantic Equivalence of Tokenization in Multimodal LLM
Figure 4 for Towards Semantic Equivalence of Tokenization in Multimodal LLM
Viaarxiv icon

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Add code
May 27, 2024
Figure 1 for A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Figure 2 for A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Figure 3 for A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Figure 4 for A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Viaarxiv icon