Albef


Barking Up The Syntactic Tree: Enhancing VLM Training with Syntactic Losses

Add code
Dec 11, 2024
Viaarxiv icon

Nearest Neighbor Normalization Improves Multimodal Retrieval

Add code
Oct 31, 2024
Figure 1 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 2 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 3 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 4 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Viaarxiv icon

Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training

Add code
May 30, 2024
Viaarxiv icon

Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM

Add code
Apr 29, 2024
Viaarxiv icon

Learning from Models and Data for Visual Grounding

Add code
Mar 20, 2024
Figure 1 for Learning from Models and Data for Visual Grounding
Figure 2 for Learning from Models and Data for Visual Grounding
Figure 3 for Learning from Models and Data for Visual Grounding
Figure 4 for Learning from Models and Data for Visual Grounding
Viaarxiv icon

Improving Adversarial Transferability of Visual-Language Pre-training Models through Collaborative Multimodal Interaction

Add code
Mar 16, 2024
Viaarxiv icon

LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival

Add code
Mar 16, 2024
Viaarxiv icon

Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control

Add code
Feb 27, 2024
Viaarxiv icon

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models

Add code
Jul 26, 2023
Viaarxiv icon

MultiModal Bias: Introducing a Framework for Stereotypical Bias Assessment beyond Gender and Race in Vision Language Models

Add code
Mar 16, 2023
Viaarxiv icon