Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

Jun 21, 2021

Andrés Villa, Juan-Manuel Perez-Rua, Vladimir Araujo, Juan Carlos Niebles, Victor Escorcia, Alvaro Soto

Figure 1 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

Figure 2 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

Figure 3 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

Figure 4 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

Share this with someone who'll enjoy it:

Abstract:Recently, few-shot learning has received increasing interest. Existing efforts have been focused on image classification, with very few attempts dedicated to the more challenging few-shot video classification problem. These few attempts aim to effectively exploit the temporal dimension in videos for better learning in low data regimes. However, they have largely ignored a key characteristic of video which could be vital for few-shot recognition, that is, videos are often accompanied by rich text descriptions. In this paper, for the first time, we propose to leverage these human-provided textual descriptions as privileged information when training a few-shot video classification model. Specifically, we formulate a text-based task conditioner to adapt video features to the few-shot learning task. Our model follows a transductive setting where query samples and support textual descriptions can be used to update the support set class prototype to further improve the task-adaptation ability of the model. Our model obtains state-of-the-art performance on four challenging benchmarks in few-shot video action classification.

* 10 pages including references, 7 figures, and 4 tables

View paper on

Share this with someone who'll enjoy it:

Title:TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

Paper and Code