Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

Feb 16, 2020

Lei Shi, Shijie Geng, Kai Shuang, Chiori Hori, Songxiang Liu, Peng Gao, Sen Su

Figure 1 for Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

Figure 2 for Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

Figure 3 for Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

Figure 4 for Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

Share this with someone who'll enjoy it:

Abstract:Multi-modality fusion technologies have greatly improved the performance of neural network-based Video Description/Caption, Visual Question Answering (VQA) and Audio Visual Scene-aware Dialog (AVSD) over the recent years. Most previous approaches only explore the last layers of multiple layer feature fusion while omitting the importance of intermediate layers. To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Network (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously. In our proposed QBN, we use the holistic text features to guide the update of visual features. In the meantime, Hamilton quaternion products can efficiently perform information flow from higher layers to lower layers for both visual and text modalities. The evaluation results show our QBN improved the performance on VQA 2.0, even though using surpass large scale BERT or visual BERT pre-trained models. Extensive ablation study has been carried out to testify the influence of each proposed module in this study.

View paper on

Share this with someone who'll enjoy it:

Title:Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

Paper and Code