Picture for Pichao Wang

Pichao Wang

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Add code
Nov 20, 2023
Figure 1 for Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Figure 2 for Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Figure 3 for Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Figure 4 for Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Viaarxiv icon

Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey

Add code
Oct 19, 2023
Viaarxiv icon

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

Add code
Sep 18, 2023
Figure 1 for SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Figure 2 for SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Figure 3 for SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Figure 4 for SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Viaarxiv icon

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

Add code
Sep 11, 2023
Figure 1 for Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition
Figure 2 for Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition
Figure 3 for Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition
Figure 4 for Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition
Viaarxiv icon

Revisiting Vision Transformer from the View of Path Ensemble

Add code
Aug 12, 2023
Viaarxiv icon

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

Add code
Jul 24, 2023
Figure 1 for Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Figure 2 for Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Figure 3 for Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Figure 4 for Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Viaarxiv icon

DOAD: Decoupled One Stage Action Detection Network

Add code
Apr 04, 2023
Figure 1 for DOAD: Decoupled One Stage Action Detection Network
Figure 2 for DOAD: Decoupled One Stage Action Detection Network
Figure 3 for DOAD: Decoupled One Stage Action Detection Network
Figure 4 for DOAD: Decoupled One Stage Action Detection Network
Viaarxiv icon

Making Vision Transformers Efficient from A Token Sparsification View

Add code
Mar 30, 2023
Figure 1 for Making Vision Transformers Efficient from A Token Sparsification View
Figure 2 for Making Vision Transformers Efficient from A Token Sparsification View
Figure 3 for Making Vision Transformers Efficient from A Token Sparsification View
Figure 4 for Making Vision Transformers Efficient from A Token Sparsification View
Viaarxiv icon

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

Add code
Mar 30, 2023
Viaarxiv icon

Selective Structured State-Spaces for Long-Form Video Understanding

Add code
Mar 25, 2023
Viaarxiv icon