Alert button
Picture for Zhe Gan

Zhe Gan

Alert button

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

Feb 20, 2024
Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan

Viaarxiv icon

InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models

Dec 21, 2023
Bingbing Wen, Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Bill Howe, Lijuan Wang

Viaarxiv icon

Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation

Nov 27, 2023
Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev

Figure 1 for Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
Figure 2 for Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
Figure 3 for Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
Figure 4 for Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
Viaarxiv icon

Ferret: Refer and Ground Anything Anywhere at Any Granularity

Oct 11, 2023
Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang

Figure 1 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 2 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 3 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 4 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Viaarxiv icon

From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions

Oct 11, 2023
Zhengfeng Lai, Haotian Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao

Figure 1 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Figure 2 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Figure 3 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Figure 4 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Viaarxiv icon

Compressing LLMs: The Truth is Rarely Pure and Never Simple

Oct 02, 2023
Ajay Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, Yinfei Yang

Viaarxiv icon

Guiding Instruction-based Image Editing via Multimodal Large Language Models

Sep 29, 2023
Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan

Figure 1 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 2 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 3 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 4 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Viaarxiv icon

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Sep 18, 2023
Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, Jianfeng Gao

Figure 1 for Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Figure 2 for Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Figure 3 for Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Figure 4 for Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Viaarxiv icon

MOFI: Learning Image Representations from Noisy Entity Annotated Images

Jun 24, 2023
Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang

Figure 1 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Figure 2 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Figure 3 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Figure 4 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Viaarxiv icon