Alert button

A Caption Is Worth A Thousand Images: Investigating Image Captions for Multimodal Named Entity Recognition

Oct 23, 2020
Shuguang Chen, Gustavo Aguilar, Leonardo Neves, Thamar Solorio

Figure 1 for A Caption Is Worth A Thousand Images: Investigating Image Captions for Multimodal Named Entity Recognition
Figure 2 for A Caption Is Worth A Thousand Images: Investigating Image Captions for Multimodal Named Entity Recognition
Figure 3 for A Caption Is Worth A Thousand Images: Investigating Image Captions for Multimodal Named Entity Recognition
Figure 4 for A Caption Is Worth A Thousand Images: Investigating Image Captions for Multimodal Named Entity Recognition

Share this with someone who'll enjoy it:

Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context. Due to advances in natural language processing (NLP) and computer vision (CV), many neural techniques have been proposed to incorporate images into the NER task. In this work, we conduct a detailed analysis of current state-of-the-art fusion techniques for MNER and describe scenarios where adding information from the image does not always result in boosts in performance. We also study the use of captions as a way to enrich the context for MNER. We provide extensive empirical analysis and an ablation study on three datasets from popular social platforms to expose the situations where the approach is beneficial.

* 8 pages, 2 figures  
View paper onarxiv icon

Share this with someone who'll enjoy it: