Cross Modal Retrieval


Cross Modal Retrieval is used for implementing a retrieval task across different modalities. such as image-text, video-text, and audio-text Cross Modal Retrieval. The main challenge of Cross Modal Retrieval is the modality gap and the key solution of Cross Modal Retrieval is to generate new representations from different modalities in the shared subspace, such that new generated features can be applied in the computation of distance metrics, such as cosine distance and Euclidean distance.

Disentangling and Generating Modalities for Recommendation in Missing Modality Scenarios

Add code
Apr 23, 2025
Viaarxiv icon

The 1st EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval

Add code
Apr 21, 2025
Viaarxiv icon

Improving Sound Source Localization with Joint Slot Attention on Image and Audio

Add code
Apr 21, 2025
Viaarxiv icon

Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models

Add code
Apr 22, 2025
Viaarxiv icon

SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs

Add code
Apr 17, 2025
Viaarxiv icon

ResNetVLLM-2: Addressing ResNetVLLM's Multi-Modal Hallucinations

Add code
Apr 20, 2025
Viaarxiv icon

Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection

Add code
Apr 20, 2025
Viaarxiv icon

PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage

Add code
Apr 15, 2025
Viaarxiv icon

HippoMM: Hippocampal-inspired Multimodal Memory for Long Audiovisual Event Understanding

Add code
Apr 14, 2025
Viaarxiv icon

TMCIR: Token Merge Benefits Composed Image Retrieval

Add code
Apr 15, 2025
Viaarxiv icon