Picture for Shanghang Zhang

Shanghang Zhang

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

Add code
Nov 18, 2024
Viaarxiv icon

Learning from Different Samples: A Source-free Framework for Semi-supervised Domain Adaptation

Add code
Nov 11, 2024
Viaarxiv icon

Training-free Regional Prompting for Diffusion Transformers

Add code
Nov 04, 2024
Viaarxiv icon

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

Add code
Oct 29, 2024
Figure 1 for Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Figure 2 for Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Figure 3 for Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Viaarxiv icon

Subgraph Aggregation for Out-of-Distribution Generalization on Graphs

Add code
Oct 29, 2024
Viaarxiv icon

EVA: An Embodied World Model for Future Video Anticipation

Add code
Oct 20, 2024
Figure 1 for EVA: An Embodied World Model for Future Video Anticipation
Figure 2 for EVA: An Embodied World Model for Future Video Anticipation
Figure 3 for EVA: An Embodied World Model for Future Video Anticipation
Figure 4 for EVA: An Embodied World Model for Future Video Anticipation
Viaarxiv icon

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Add code
Oct 06, 2024
Figure 1 for SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Figure 2 for SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Figure 3 for SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Figure 4 for SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Viaarxiv icon

Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation

Add code
Sep 24, 2024
Viaarxiv icon

FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models

Add code
Aug 15, 2024
Figure 1 for FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Figure 2 for FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Figure 3 for FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Figure 4 for FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Viaarxiv icon

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Add code
Jul 30, 2024
Figure 1 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 2 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 3 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 4 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Viaarxiv icon