Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Dec 31, 2020

Wei Li, Can Gao, Guocheng Niu, Xinyan Xiao, Hao Liu, Jiachen Liu, Hua Wu, Haifeng Wang

Figure 1 for UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Figure 2 for UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Figure 3 for UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Figure 4 for UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Share this with someone who'll enjoy it:

Abstract:Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other. They can only utilize single-modal data (i.e. text or image) or limited multi-modal data (i.e. image-text pairs). In this work, we propose a unified-modal pre-training architecture, namely UNIMO, which can effectively adapt to both single-modal and multi-modal understanding and generation tasks. Large scale of free text corpus and image collections can be utilized to improve the capability of visual and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified semantic space over a corpus of image-text pairs. As the non-paired single-modal data is very rich, our model can utilize much larger scale of data to learn more generalizable representations. Moreover, the textual knowledge and visual knowledge can enhance each other in the unified semantic space. The experimental results show that UNIMO significantly improves the performance of several single-modal and multi-modal downstream tasks.

* 11 pages

View paper on

Share this with someone who'll enjoy it:

Title:UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Paper and Code