Picture for Vasudev Lal

Vasudev Lal

Why do LLaVA Vision-Language Models Reply to Images in English?

Add code
Jul 02, 2024
Figure 1 for Why do LLaVA Vision-Language Models Reply to Images in English?
Figure 2 for Why do LLaVA Vision-Language Models Reply to Images in English?
Figure 3 for Why do LLaVA Vision-Language Models Reply to Images in English?
Figure 4 for Why do LLaVA Vision-Language Models Reply to Images in English?
Viaarxiv icon

SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs

Add code
Jun 28, 2024
Viaarxiv icon

L-MAGIC: Language Model Assisted Generation of Images with Coherence

Add code
Jun 03, 2024
Viaarxiv icon

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

Add code
Apr 03, 2024
Figure 1 for LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Figure 2 for LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Figure 3 for LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Figure 4 for LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Viaarxiv icon

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Add code
Apr 01, 2024
Figure 1 for Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Figure 2 for Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Figure 3 for Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Figure 4 for Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Viaarxiv icon

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Add code
Mar 29, 2024
Figure 1 for LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Figure 2 for LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Figure 3 for LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Viaarxiv icon

Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Add code
Nov 30, 2023
Figure 1 for Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Figure 2 for Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Figure 3 for Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Figure 4 for Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Viaarxiv icon

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Add code
Nov 20, 2023
Figure 1 for NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Figure 2 for NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Figure 3 for NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Figure 4 for NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Viaarxiv icon

LDM3D-VR: Latent Diffusion Model for 3D VR

Add code
Nov 06, 2023
Viaarxiv icon

Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks

Add code
Oct 07, 2023
Figure 1 for Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Figure 2 for Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Figure 3 for Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Figure 4 for Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Viaarxiv icon