Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

Aug 21, 2020

Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z. Li, Guoying Zhao

Figure 1 for Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

Figure 2 for Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

Figure 3 for Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

Figure 4 for Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

Share this with someone who'll enjoy it:

Abstract:Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for multi-sampling-rate branches and lateral connections among varied modalities. The resultant multi-modal multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics. Comprehensive experiments are performed on three benchmark datasets (IsoGD, NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in both single- and multi-modality settings.The code is available at https://github.com/ZitongYu/3DCDC-NAS

* Submitted to IEEE Transactions on Image Processing

View paper on

Share this with someone who'll enjoy it:

Title:Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

Paper and Code