Alert button
Picture for Junlong Du

Junlong Du

Alert button

Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning

Aug 09, 2023
Qiang Wang, Junlong Du, Ke Yan, Shouhong Ding

The Contrastive Language-Image Pre-training (CLIP) has recently shown remarkable generalization on "zero-shot" training and has applied to many downstream tasks. We explore the adaptation of CLIP to achieve a more efficient and generalized action recognition method. We propose that the key lies in explicitly modeling the motion cues flowing in video frames. To that end, we design a two-stream motion modeling block to capture motion and spatial information at the same time. And then, the obtained motion cues are utilized to drive a dynamic prompts learner to generate motion-aware prompts, which contain much semantic information concerning human actions. In addition, we propose a multimodal communication block to achieve a collaborative learning and further improve the performance. We conduct extensive experiments on HMDB-51, UCF-101, and Kinetics-400 datasets. Our method outperforms most existing state-of-the-art methods by a significant margin on "few-shot" and "zero-shot" training. We also achieve competitive performance on "closed-set" training with extremely few trainable parameters and additional computational costs.

* Accepted by ACM MM 2023 
Viaarxiv icon

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination

Jul 27, 2020
Penghao Zhou, Chong Zhou, Pai Peng, Junlong Du, Xing Sun, Xiaowei Guo, Feiyue Huang

Figure 1 for NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination
Figure 2 for NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination
Figure 3 for NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination
Figure 4 for NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination

Greedy-NMS inherently raises a dilemma, where a lower NMS threshold will potentially lead to a lower recall rate and a higher threshold introduces more false positives. This problem is more severe in pedestrian detection because the instance density varies more intensively. However, previous works on NMS don't consider or vaguely consider the factor of the existent of nearby pedestrians. Thus, we propose Nearby Objects Hallucinator (NOH), which pinpoints the objects nearby each proposal with a Gaussian distribution, together with NOH-NMS, which dynamically eases the suppression for the space that might contain other objects with a high likelihood. Compared to Greedy-NMS, our method, as the state-of-the-art, improves by $3.9\%$ AP, $5.1\%$ Recall, and $0.8\%$ $\text{MR}^{-2}$ on CrowdHuman to $89.0\%$ AP and $92.9\%$ Recall, and $43.9\%$ $\text{MR}^{-2}$ respectively.

* Accepted at the ACM International Conference on Multimedia (ACM MM) 2020 
Viaarxiv icon