Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

Oct 24, 2023

Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li

Figure 1 for I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

Figure 2 for I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

Figure 3 for I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

Figure 4 for I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

Share this with someone who'll enjoy it:

Abstract:Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited action categories. In this work, we tackle the aforementioned problems by introducing a general Inter- and Intra-modal Mutual Distillation (I$^2$MD) framework. In I$^2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process. Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training. To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy, In IMD, the Dynamic Neighbors Aggregation (DNA) mechanism is first introduced, where an additional cluster-level discrimination branch is instantiated in each modality. It adaptively aggregates highly-correlated neighboring features, forming local cluster-level contrasting. Mutual distillation is then performed between the two branches for cross-level knowledge exchange. Extensive experiments on three datasets show that our approach sets a series of new records.

* submitted to IJCV. arXiv admin note: substantial text overlap with arXiv:2208.12448

View paper on

Share this with someone who'll enjoy it:

Title:I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

Paper and Code