Albef


Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models

Add code
Jun 15, 2025
Viaarxiv icon

FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining

Add code
May 19, 2025
Viaarxiv icon

Barking Up The Syntactic Tree: Enhancing VLM Training with Syntactic Losses

Add code
Dec 11, 2024
Viaarxiv icon

Nearest Neighbor Normalization Improves Multimodal Retrieval

Add code
Oct 31, 2024
Figure 1 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 2 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 3 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 4 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Viaarxiv icon

Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training

Add code
May 30, 2024
Figure 1 for Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training
Figure 2 for Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training
Figure 3 for Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training
Figure 4 for Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training
Viaarxiv icon

Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM

Add code
Apr 29, 2024
Figure 1 for Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
Figure 2 for Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
Figure 3 for Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
Figure 4 for Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
Viaarxiv icon

Learning from Models and Data for Visual Grounding

Add code
Mar 20, 2024
Figure 1 for Learning from Models and Data for Visual Grounding
Figure 2 for Learning from Models and Data for Visual Grounding
Figure 3 for Learning from Models and Data for Visual Grounding
Figure 4 for Learning from Models and Data for Visual Grounding
Viaarxiv icon

LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival

Add code
Mar 16, 2024
Viaarxiv icon

Improving Adversarial Transferability of Visual-Language Pre-training Models through Collaborative Multimodal Interaction

Add code
Mar 16, 2024
Viaarxiv icon

Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control

Add code
Feb 27, 2024
Viaarxiv icon