Alert button
Picture for Yinfei Yang

Yinfei Yang

Alert button

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Add code
Bookmark button
Alert button
Apr 11, 2024
Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

Viaarxiv icon

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Add code
Bookmark button
Alert button
Apr 08, 2024
Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan

Viaarxiv icon

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Add code
Bookmark button
Alert button
Mar 22, 2024
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang

Figure 1 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 2 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 3 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 4 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Viaarxiv icon

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

Add code
Bookmark button
Alert button
Feb 20, 2024
Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan

Viaarxiv icon

Ferret: Refer and Ground Anything Anywhere at Any Granularity

Add code
Bookmark button
Alert button
Oct 11, 2023
Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang

Figure 1 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 2 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 3 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 4 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Viaarxiv icon

From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions

Add code
Bookmark button
Alert button
Oct 11, 2023
Zhengfeng Lai, Haotian Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao

Figure 1 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Figure 2 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Figure 3 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Figure 4 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Viaarxiv icon

Compressing LLMs: The Truth is Rarely Pure and Never Simple

Add code
Bookmark button
Alert button
Oct 02, 2023
Ajay Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, Yinfei Yang

Viaarxiv icon

Guiding Instruction-based Image Editing via Multimodal Large Language Models

Add code
Bookmark button
Alert button
Sep 29, 2023
Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan

Figure 1 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 2 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 3 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 4 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Viaarxiv icon