Alert button

MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Apr 22, 2022
Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A. Hedderich, Dietrich Klakow

Figure 1 for MCSE: Multimodal Contrastive Learning of Sentence Embeddings
Figure 2 for MCSE: Multimodal Contrastive Learning of Sentence Embeddings
Figure 3 for MCSE: Multimodal Contrastive Learning of Sentence Embeddings
Figure 4 for MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Share this with someone who'll enjoy it:

Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal contrastive objective. Through experiments on a variety of semantic textual similarity tasks, we demonstrate that our approach consistently improves the performance across various datasets and pre-trained encoders. In particular, combining a small amount of multimodal data with a large text-only corpus, we improve the state-of-the-art average Spearman's correlation by 1.7%. By analyzing the properties of the textual embedding space, we show that our model excels in aligning semantically similar sentences, providing an explanation for its improved performance.

* Accepted by NAACL 2022 main conference (short paper), 11 pages  
View paper onarxiv icon

Share this with someone who'll enjoy it: