Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Contrastive Bidirectional Transformer for Temporal Representation Learning

Jun 13, 2019

Chen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid

Figure 1 for Contrastive Bidirectional Transformer for Temporal Representation Learning

Figure 2 for Contrastive Bidirectional Transformer for Temporal Representation Learning

Figure 3 for Contrastive Bidirectional Transformer for Temporal Representation Learning

Figure 4 for Contrastive Bidirectional Transformer for Temporal Representation Learning

Share this with someone who'll enjoy it:

Abstract:This paper aims at learning representations for long sequences of continuous signals. Recently, the BERT model has demonstrated the effectiveness of stacked transformers for representing sequences of discrete signals (i.e. word tokens). Inspired by its success, we adopt the stacked transformer architecture, but generalize its training objective to maximize the mutual information between the masked signals, and the bidirectional context, via contrastive loss. This enables the model to handle continuous signals, such as visual features. We further consider the case when there are multiple sequences that are semantically aligned at the sequence-level but not at the element-level (e.g. video and ASR), where we propose to use a Transformer to estimate the mutual information between the two sequences, which is again maximized via contrastive loss. We demonstrate the effectiveness of the learned representations on modeling long video sequences for action anticipation and video captioning. The results show that our method, referred to by Contrastive Bidirectional Transformer ({\bf CBT}), outperforms various baselines significantly. Furthermore, we improve over the state of the art.

View paper on

Share this with someone who'll enjoy it:

Title:Contrastive Bidirectional Transformer for Temporal Representation Learning

Paper and Code