Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:WebQA: Multihop and Multimodal QA

Sep 21, 2021

Yingshan Chang, Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao, Yonatan Bisk

Figure 1 for WebQA: Multihop and Multimodal QA

Figure 2 for WebQA: Multihop and Multimodal QA

Figure 3 for WebQA: Multihop and Multimodal QA

Figure 4 for WebQA: Multihop and Multimodal QA

Share this with someone who'll enjoy it:

Abstract:Web search is fundamentally multimodal and multihop. Often, even before asking a question we choose to go directly to image search to find our answers. Further, rarely do we find an answer from a single source but aggregate information and reason through implications. Despite the frequency of this everyday occurrence, at present, there is no unified question answering benchmark that requires a single model to answer long-form natural language questions from text and open-ended visual sources -- akin to a human's experience. We propose to bridge this gap between the natural language and computer vision communities with WebQA. We show that A. our multihop text queries are difficult for a large-scale transformer model, and B. existing multi-modal transformers and visual representations do not perform well on open-domain visual queries. Our challenge for the community is to create a unified multimodal reasoning model that seamlessly transitions and reasons regardless of the source modality.

View paper on

Share this with someone who'll enjoy it:

Title:WebQA: Multihop and Multimodal QA

Paper and Code