Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Mutual Information Maximization Perspective of Language Representation Learning

Nov 26, 2019

Lingpeng Kong, Cyprien de Masson d'Autume, Wang Ling, Lei Yu, Zihang Dai, Dani Yogatama

Figure 1 for A Mutual Information Maximization Perspective of Language Representation Learning

Figure 2 for A Mutual Information Maximization Perspective of Language Representation Learning

Share this with someone who'll enjoy it:

Abstract:We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing our theoretical understanding of these methods, our derivation leads to a principled framework that can be used to construct new self-supervised tasks. We provide an example by drawing inspirations from related methods based on mutual information maximization that have been successful in computer vision, and introduce a simple self-supervised objective that maximizes the mutual information between a global sentence representation and n-grams in the sentence. Our analysis offers a holistic view of representation learning methods to transfer knowledge and translate progress across multiple domains (e.g., natural language processing, computer vision, audio processing).

* 12 pages, 3 figures

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:A Mutual Information Maximization Perspective of Language Representation Learning

Paper and Code