Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Progressive Learning for Image Retrieval with Hybrid-Modality Queries

Apr 24, 2022

Yida Zhao, Yuqing Song, Qin Jin

Figure 1 for Progressive Learning for Image Retrieval with Hybrid-Modality Queries

Figure 2 for Progressive Learning for Image Retrieval with Hybrid-Modality Queries

Figure 3 for Progressive Learning for Image Retrieval with Hybrid-Modality Queries

Figure 4 for Progressive Learning for Image Retrieval with Hybrid-Modality Queries

Share this with someone who'll enjoy it:

Abstract:Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities. For example, a target product image is searched using a reference product image along with text about changing certain attributes of the reference image as the query. It is a more challenging image retrieval task that requires both semantic space learning and cross-modal fusion. Previous approaches that attempt to deal with both aspects achieve unsatisfactory performance. In this paper, we decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. We first leverage the semantic embedding space for open-domain image-text retrieval, and then transfer the learned knowledge to the fashion-domain with fashion-related pre-training tasks. Finally, we enhance the pre-trained model from single-query to hybrid-modality query for the CTI-IR task. Furthermore, as the contribution of individual modality in the hybrid-modality query varies for different retrieval scenarios, we propose a self-supervised adaptive weighting strategy to dynamically determine the importance of image and text in the hybrid-modality query for better retrieval. Extensive experiments show that our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.

* Accepted by SIGIR 2022

View paper on

Share this with someone who'll enjoy it:

Title:Progressive Learning for Image Retrieval with Hybrid-Modality Queries

Paper and Code