Alert button
Picture for Yinfei Yang

Yinfei Yang

Alert button

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Add code
Bookmark button
Alert button
Sep 08, 2023
Erik Daxberger, Floris Weers, Bowen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du

Figure 1 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 2 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 3 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 4 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Viaarxiv icon

MOFI: Learning Image Representations from Noisy Entity Annotated Images

Add code
Bookmark button
Alert button
Jun 24, 2023
Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang

Figure 1 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Figure 2 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Figure 3 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Figure 4 for MOFI: Learning Image Representations from Noisy Entity Annotated Images
Viaarxiv icon

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

Add code
Bookmark button
Alert button
May 08, 2023
Liangliang Cao, Bowen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng

Figure 1 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Figure 2 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Figure 3 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Figure 4 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Viaarxiv icon

On Robustness in Multimodal Learning

Add code
Bookmark button
Alert button
Apr 11, 2023
Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev

Figure 1 for On Robustness in Multimodal Learning
Figure 2 for On Robustness in Multimodal Learning
Figure 3 for On Robustness in Multimodal Learning
Figure 4 for On Robustness in Multimodal Learning
Viaarxiv icon

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

Add code
Bookmark button
Alert button
Feb 08, 2023
Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang

Figure 1 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Figure 2 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Figure 3 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Figure 4 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Viaarxiv icon

Self Supervision Does Not Help Natural Language Supervision at Scale

Add code
Bookmark button
Alert button
Jan 20, 2023
Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter

Figure 1 for Self Supervision Does Not Help Natural Language Supervision at Scale
Figure 2 for Self Supervision Does Not Help Natural Language Supervision at Scale
Figure 3 for Self Supervision Does Not Help Natural Language Supervision at Scale
Figure 4 for Self Supervision Does Not Help Natural Language Supervision at Scale
Viaarxiv icon

Perceptual Grouping in Vision-Language Models

Add code
Bookmark button
Alert button
Oct 18, 2022
Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Yang, Alexander Toshev, Jonathon Shlens

Figure 1 for Perceptual Grouping in Vision-Language Models
Figure 2 for Perceptual Grouping in Vision-Language Models
Figure 3 for Perceptual Grouping in Vision-Language Models
Figure 4 for Perceptual Grouping in Vision-Language Models
Viaarxiv icon

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Add code
Bookmark button
Alert button
Oct 06, 2022
Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh

Figure 1 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Figure 2 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Figure 3 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Figure 4 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Viaarxiv icon