Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Differentiated Relevances Embedding for Group-based Referring Expression Comprehension

Mar 12, 2022

Fuhai Chen, Xiaoshuai Sun, Xuri Ge, Jianzhuang Liu, Yongjian Wu, Feiyue Huang, Rongrong Ji

Figure 1 for Differentiated Relevances Embedding for Group-based Referring Expression Comprehension

Figure 2 for Differentiated Relevances Embedding for Group-based Referring Expression Comprehension

Figure 3 for Differentiated Relevances Embedding for Group-based Referring Expression Comprehension

Figure 4 for Differentiated Relevances Embedding for Group-based Referring Expression Comprehension

Share this with someone who'll enjoy it:

Abstract:Referring expression comprehension (REC) aims to locate a certain object in an image referred by a natural language expression. For joint understanding of regions and expressions, existing REC works typically target on modeling the cross-modal relevance in each region-expression pair within each single image. In this paper, we explore a new but general REC-related problem, named Group-based REC, where the regions and expressions can come from different subject-related images (images in the same group), e.g., sets of photo albums or video frames. Different from REC, Group-based REC involves differentiated cross-modal relevances within each group and across different groups, which, however, are neglected in the existing one-line paradigm. To this end, we propose a novel relevance-guided multi-group self-paced learning schema (termed RMSL), where the within-group region-expression pairs are adaptively assigned with different priorities according to their cross-modal relevances, and the bias of the group priority is balanced via an across-group relevance constraint simultaneously. In particular, based on the visual and textual semantic features, RMSL conducts an adaptive learning cycle upon triplet ranking, where (1) the target-negative region-expression pairs with low within-group relevances are used preferentially in model training to distinguish the primary semantics of the target objects, and (2) an across-group relevance regularization is integrated into model training to balance the bias of group priority. The relevances, the pairs, and the model parameters are alternatively updated upon a unified self-paced hinge loss.

View paper on

Share this with someone who'll enjoy it:

Title:Differentiated Relevances Embedding for Group-based Referring Expression Comprehension

Paper and Code