Picture for Fangxun Shu

Fangxun Shu

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Add code
Jul 11, 2024
Viaarxiv icon

Autoregressive Pretraining with Mamba in Vision

Add code
Jun 11, 2024
Viaarxiv icon

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Add code
Mar 20, 2024
Figure 1 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Figure 2 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Figure 3 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Figure 4 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Viaarxiv icon

Audio-Visual LLM for Video Understanding

Add code
Dec 13, 2023
Viaarxiv icon

Compress & Align: Curating Image-Text Data with Human Knowledge

Add code
Dec 13, 2023
Viaarxiv icon

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

Add code
Dec 05, 2022
Figure 1 for Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Figure 2 for Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Figure 3 for Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Figure 4 for Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Viaarxiv icon