Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Cross-modal Variational Auto-encoder for Content-based Micro-video Background Music Recommendation

Jul 15, 2021

Jing Yi, Yaochen Zhu, Jiayi Xie, Zhenzhong Chen

Figure 1 for Cross-modal Variational Auto-encoder for Content-based Micro-video Background Music Recommendation

Figure 2 for Cross-modal Variational Auto-encoder for Content-based Micro-video Background Music Recommendation

Figure 3 for Cross-modal Variational Auto-encoder for Content-based Micro-video Background Music Recommendation

Figure 4 for Cross-modal Variational Auto-encoder for Content-based Micro-video Background Music Recommendation

Share this with someone who'll enjoy it:

Abstract:In this paper, we propose a cross-modal variational auto-encoder (CMVAE) for content-based micro-video background music recommendation. CMVAE is a hierarchical Bayesian generative model that matches relevant background music to a micro-video by projecting these two multimodal inputs into a shared low-dimensional latent space, where the alignment of two corresponding embeddings of a matched video-music pair is achieved by cross-generation. Moreover, the multimodal information is fused by the product-of-experts (PoE) principle, where the semantic information in visual and textual modalities of the micro-video are weighted according to their variance estimations such that the modality with a lower noise level is given more weights. Therefore, the micro-video latent variables contain less irrelevant information that results in a more robust model generalization. Furthermore, we establish a large-scale content-based micro-video background music recommendation dataset, TT-150k, composed of approximately 3,000 different background music clips associated to 150,000 micro-videos from different users. Extensive experiments on the established TT-150k dataset demonstrate the effectiveness of the proposed method. A qualitative assessment of CMVAE by visualizing some recommendation results is also included.

View paper on

Share this with someone who'll enjoy it:

Title:Cross-modal Variational Auto-encoder for Content-based Micro-video Background Music Recommendation

Paper and Code