Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Aug 11, 2020

Jiacheng Li, Siliang Tang, Juncheng Li, Jun Xiao, Fei Wu, Shiliang Pu, Yueting Zhuang

Figure 1 for Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Figure 2 for Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Figure 3 for Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Figure 4 for Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Share this with someone who'll enjoy it:

Abstract:Visual Storytelling~(VIST) is a task to tell a narrative story about a certain topic according to the given photo stream. The existing studies focus on designing complex models, which rely on a huge amount of human-annotated data. However, the annotation of VIST is extremely costly and many topics cannot be covered in the training dataset due to the long-tail topic distribution. In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting. Inspired by the way humans tell a story, we propose a topic adaptive storyteller to model the ability of inter-topic generalization. In practice, we apply the gradient-based meta-learning algorithm on multi-modal seq2seq models to endow the model the ability to adapt quickly from topic to topic. Besides, We further propose a prototype encoding structure to model the ability of intra-topic derivation. Specifically, we encode and restore the few training story text to serve as a reference to guide the generation at inference time. Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model on BLEU and METEOR metric. The further case study shows that the stories generated after few-shot adaptation are more relative and expressive.

* ACM Multimedia 2020

View paper on

Share this with someone who'll enjoy it:

Title:Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Paper and Code