Picture for Yinfei Yang

Yinfei Yang

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Add code
Oct 03, 2024
Viaarxiv icon

Contrastive Localized Language-Image Pre-Training

Add code
Oct 03, 2024
Figure 1 for Contrastive Localized Language-Image Pre-Training
Figure 2 for Contrastive Localized Language-Image Pre-Training
Figure 3 for Contrastive Localized Language-Image Pre-Training
Figure 4 for Contrastive Localized Language-Image Pre-Training
Viaarxiv icon

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Add code
Sep 30, 2024
Figure 1 for MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Figure 2 for MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Figure 3 for MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Figure 4 for MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Viaarxiv icon

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Add code
Jul 02, 2024
Figure 1 for Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Figure 2 for Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Figure 3 for Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Figure 4 for Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Viaarxiv icon

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Add code
Jul 01, 2024
Figure 1 for MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Figure 2 for MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Figure 3 for MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Figure 4 for MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Viaarxiv icon

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Add code
Apr 11, 2024
Figure 1 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 2 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 3 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 4 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Viaarxiv icon

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Add code
Apr 08, 2024
Viaarxiv icon

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Add code
Mar 22, 2024
Figure 1 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 2 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 3 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 4 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Viaarxiv icon

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

Add code
Feb 20, 2024
Figure 1 for How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Figure 2 for How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Figure 3 for How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Figure 4 for How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Viaarxiv icon

Ferret: Refer and Ground Anything Anywhere at Any Granularity

Add code
Oct 11, 2023
Figure 1 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 2 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 3 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 4 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Viaarxiv icon