Cross Modal Retrieval


Cross Modal Retrieval is used for implementing a retrieval task across different modalities. such as image-text, video-text, and audio-text Cross Modal Retrieval. The main challenge of Cross Modal Retrieval is the modality gap and the key solution of Cross Modal Retrieval is to generate new representations from different modalities in the shared subspace, such that new generated features can be applied in the computation of distance metrics, such as cosine distance and Euclidean distance.

LaVPR: Benchmarking Language and Vision for Place Recognition

Add code
Feb 03, 2026
Viaarxiv icon

TextME: Bridging Unseen Modalities Through Text Descriptions

Add code
Feb 03, 2026
Viaarxiv icon

Toward Effective Multimodal Graph Foundation Model: A Divide-and-Conquer Based Approach

Add code
Feb 04, 2026
Viaarxiv icon

VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

Add code
Feb 04, 2026
Viaarxiv icon

Cross-Temporal Attention Fusion (CTAF) for Multimodal Physiological Signals in Self-Supervised Learning

Add code
Feb 02, 2026
Viaarxiv icon

ReCALL: Recalibrating Capability Degradation for MLLM-based Composed Image Retrieval

Add code
Feb 02, 2026
Viaarxiv icon

RASST: Fast Cross-modal Retrieval-Augmented Simultaneous Speech Translation

Add code
Jan 30, 2026
Viaarxiv icon

Contrastive Domain Generalization for Cross-Instrument Molecular Identification in Mass Spectrometry

Add code
Jan 31, 2026
Viaarxiv icon

Beyond Global Alignment: Fine-Grained Motion-Language Retrieval via Pyramidal Shapley-Taylor Learning

Add code
Jan 29, 2026
Viaarxiv icon

CoVA: Text-Guided Composed Video Retrieval for Audio-Visual Content

Add code
Jan 30, 2026
Viaarxiv icon