Image Retrieval


MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval

Add code
May 26, 2025
Viaarxiv icon

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval

Add code
May 26, 2025
Viaarxiv icon

Can Visual Encoder Learn to See Arrows?

Add code
May 26, 2025
Viaarxiv icon

Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models

Add code
May 26, 2025
Viaarxiv icon

Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

Add code
May 26, 2025
Viaarxiv icon

BRIT: Bidirectional Retrieval over Unified Image-Text Graph

Add code
May 24, 2025
Viaarxiv icon

TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP

Add code
May 24, 2025
Viaarxiv icon

EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models

Add code
May 24, 2025
Viaarxiv icon

Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection

Add code
May 25, 2025
Viaarxiv icon

Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation

Add code
May 24, 2025
Viaarxiv icon