Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization

Apr 26, 2021

Jia-Hong Huang, Luka Murn, Marta Mrak, Marcel Worring

Figure 1 for GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization

Figure 2 for GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization

Figure 3 for GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization

Figure 4 for GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization

Share this with someone who'll enjoy it:

Abstract:Traditional video summarization methods generate fixed video representations regardless of user interest. Therefore such methods limit users' expectations in content search and exploration scenarios. Multi-modal video summarization is one of the methods utilized to address this problem. When multi-modal video summarization is used to help video exploration, a text-based query is considered as one of the main drivers of video summary generation, as it is user-defined. Thus, encoding the text-based query and the video effectively are both important for the task of multi-modal video summarization. In this work, a new method is proposed that uses a specialized attention network and contextualized word representations to tackle this task. The proposed model consists of a contextualized video summary controller, multi-modal attention mechanisms, an interactive attention network, and a video summary generator. Based on the evaluation of the existing multi-modal video summarization benchmark, experimental results show that the proposed model is effective with the increase of +5.88% in accuracy and +4.06% increase of F1-score, compared with the state-of-the-art method.

* This paper is accepted by ACM International Conference on Multimedia Retrieval (ICMR), 2021

View paper on

Share this with someone who'll enjoy it:

Title:GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization

Paper and Code