Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

Dec 11, 2020

Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

Figure 1 for D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

Figure 2 for D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

Figure 3 for D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

Figure 4 for D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

Share this with someone who'll enjoy it:

Abstract:This work proposes a weakly-supervised temporal action localization framework, called D2-Net, which strives to temporally localize actions using video-level supervision. Our main contribution is the introduction of a novel loss formulation, which jointly enhances the discriminability of latent embeddings and robustness of the output temporal class activations with respect to foreground-background noise caused by weak supervision. The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization. The discriminative term incorporates a classification loss and utilizes a top-down attention mechanism to enhance the separability of latent foreground-background embeddings. The denoising loss term explicitly addresses the foreground-background noise in class activations by simultaneously maximizing intra-video and inter-video mutual information using a bottom-up attention mechanism. As a result, activations in the foreground regions are emphasized whereas those in the background regions are suppressed, thereby leading to more robust predictions. Comprehensive experiments are performed on two benchmarks: THUMOS14 and ActivityNet1.2. Our D2-Net performs favorably in comparison to the existing methods on both datasets, achieving gains as high as 3.6% in terms of mean average precision on THUMOS14.

View paper on

Share this with someone who'll enjoy it:

Title:D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

Paper and Code