Alert button
Picture for Yinfei Yang

Yinfei Yang

Alert button

Ferret: Refer and Ground Anything Anywhere at Any Granularity

Oct 11, 2023
Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang

Figure 1 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 2 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 3 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 4 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Viaarxiv icon

From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions

Oct 11, 2023
Zhengfeng Lai, Haotian Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao

Figure 1 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Figure 2 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Figure 3 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Figure 4 for From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
Viaarxiv icon

Compressing LLMs: The Truth is Rarely Pure and Never Simple

Oct 02, 2023
Ajay Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, Yinfei Yang

Viaarxiv icon

Guiding Instruction-based Image Editing via Multimodal Large Language Models

Sep 29, 2023
Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan

Figure 1 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 2 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 3 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 4 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Viaarxiv icon

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Sep 08, 2023
Erik Daxberger, Floris Weers, Bowen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du

Figure 1 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 2 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 3 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 4 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Viaarxiv icon

MOFI: Learning Image Representations from Noisy Entity Annotated Images

Jun 24, 2023
Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang

Figure 1 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Figure 2 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Figure 3 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Figure 4 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Viaarxiv icon

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

May 08, 2023
Liangliang Cao, Bowen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng

Figure 1 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Figure 2 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Figure 3 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Figure 4 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Viaarxiv icon

On Robustness in Multimodal Learning

Apr 11, 2023
Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev

Figure 1 for On Robustness in Multimodal Learning
Figure 2 for On Robustness in Multimodal Learning
Figure 3 for On Robustness in Multimodal Learning
Figure 4 for On Robustness in Multimodal Learning
Viaarxiv icon

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

Feb 08, 2023
Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang

Figure 1 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Figure 2 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Figure 3 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Figure 4 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Viaarxiv icon