Picture for Chunyuan Li

Chunyuan Li

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Add code
Nov 09, 2023
Figure 1 for LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Figure 2 for LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Figure 3 for LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Figure 4 for LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Viaarxiv icon

Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images

Add code
Nov 02, 2023
Figure 1 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Figure 2 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Figure 3 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Figure 4 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Viaarxiv icon

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Add code
Nov 01, 2023
Viaarxiv icon

Large Language Models are Visual Reasoning Coordinators

Add code
Oct 23, 2023
Viaarxiv icon

BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys

Add code
Oct 21, 2023
Figure 1 for BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys
Figure 2 for BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys
Figure 3 for BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys
Figure 4 for BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys
Viaarxiv icon

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

Add code
Oct 17, 2023
Figure 1 for Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Figure 2 for Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Figure 3 for Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Figure 4 for Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Viaarxiv icon

Improved Baselines with Visual Instruction Tuning

Add code
Oct 05, 2023
Figure 1 for Improved Baselines with Visual Instruction Tuning
Figure 2 for Improved Baselines with Visual Instruction Tuning
Figure 3 for Improved Baselines with Visual Instruction Tuning
Figure 4 for Improved Baselines with Visual Instruction Tuning
Viaarxiv icon

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Add code
Oct 03, 2023
Viaarxiv icon

Aligning Large Multimodal Models with Factually Augmented RLHF

Add code
Sep 25, 2023
Viaarxiv icon

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

Add code
Sep 18, 2023
Viaarxiv icon