Picture for Yinfei Yang

Yinfei Yang

From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions

Add code
Oct 11, 2023
Viaarxiv icon

Compressing LLMs: The Truth is Rarely Pure and Never Simple

Add code
Oct 02, 2023
Viaarxiv icon

Guiding Instruction-based Image Editing via Multimodal Large Language Models

Add code
Sep 29, 2023
Viaarxiv icon

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Add code
Sep 08, 2023
Figure 1 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 2 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 3 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 4 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Viaarxiv icon

MOFI: Learning Image Representations from Noisy Entity Annotated Images

Add code
Jun 24, 2023
Viaarxiv icon

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

Add code
May 08, 2023
Viaarxiv icon

On Robustness in Multimodal Learning

Add code
Apr 11, 2023
Viaarxiv icon

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

Add code
Feb 08, 2023
Viaarxiv icon

Self Supervision Does Not Help Natural Language Supervision at Scale

Add code
Jan 20, 2023
Figure 1 for Self Supervision Does Not Help Natural Language Supervision at Scale
Figure 2 for Self Supervision Does Not Help Natural Language Supervision at Scale
Figure 3 for Self Supervision Does Not Help Natural Language Supervision at Scale
Figure 4 for Self Supervision Does Not Help Natural Language Supervision at Scale
Viaarxiv icon

Perceptual Grouping in Vision-Language Models

Add code
Oct 18, 2022
Figure 1 for Perceptual Grouping in Vision-Language Models
Figure 2 for Perceptual Grouping in Vision-Language Models
Figure 3 for Perceptual Grouping in Vision-Language Models
Figure 4 for Perceptual Grouping in Vision-Language Models
Viaarxiv icon