Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Visual Composite Set Detection Using Part-and-Sum Transformers

May 05, 2021

Qi Dong, Zhuowen Tu, Haofu Liao, Yuting Zhang, Vijay Mahadevan, Stefano Soatto

Figure 1 for Visual Composite Set Detection Using Part-and-Sum Transformers

Figure 2 for Visual Composite Set Detection Using Part-and-Sum Transformers

Figure 3 for Visual Composite Set Detection Using Part-and-Sum Transformers

Figure 4 for Visual Composite Set Detection Using Part-and-Sum Transformers

Share this with someone who'll enjoy it:

Abstract:Computer vision applications such as visual relationship detection and human-object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion. In this paper, we present a new approach, denoted Part-and-Sum detection Transformer (PST), to perform end-to-end composite set detection. Different from existing Transformers in which queries are at a single level, we simultaneously model the joint part and sum hypotheses/interactions with composite queries and attention modules. We explicitly incorporate sum queries to enable better modeling of the part-and-sum relations that are absent in the standard Transformers. Our approach also uses novel tensor-based part queries and vector-based sum queries, and models their joint interaction. We report experiments on two vision tasks, visual relationship detection, and human-object interaction, and demonstrate that PST achieves state-of-the-art results among single-stage models, while nearly matching the results of custom-designed two-stage models.

View paper on

Share this with someone who'll enjoy it:

Title:Visual Composite Set Detection Using Part-and-Sum Transformers

Paper and Code