Picture for Pichao Wang

Pichao Wang

Hallucination of Multimodal Large Language Models: A Survey

Add code
Apr 29, 2024
Viaarxiv icon

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Add code
Mar 26, 2024
Figure 1 for Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Figure 2 for Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Figure 3 for Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Figure 4 for Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Viaarxiv icon

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Add code
Nov 20, 2023
Viaarxiv icon

Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey

Add code
Oct 19, 2023
Figure 1 for Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey
Figure 2 for Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey
Figure 3 for Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey
Figure 4 for Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey
Viaarxiv icon

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

Add code
Sep 18, 2023
Figure 1 for SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Figure 2 for SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Figure 3 for SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Figure 4 for SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Viaarxiv icon

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

Add code
Sep 11, 2023
Figure 1 for Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition
Figure 2 for Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition
Figure 3 for Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition
Figure 4 for Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition
Viaarxiv icon

Revisiting Vision Transformer from the View of Path Ensemble

Add code
Aug 12, 2023
Figure 1 for Revisiting Vision Transformer from the View of Path Ensemble
Figure 2 for Revisiting Vision Transformer from the View of Path Ensemble
Figure 3 for Revisiting Vision Transformer from the View of Path Ensemble
Figure 4 for Revisiting Vision Transformer from the View of Path Ensemble
Viaarxiv icon

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

Add code
Jul 24, 2023
Figure 1 for Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Figure 2 for Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Figure 3 for Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Figure 4 for Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Viaarxiv icon

DOAD: Decoupled One Stage Action Detection Network

Add code
Apr 04, 2023
Figure 1 for DOAD: Decoupled One Stage Action Detection Network
Figure 2 for DOAD: Decoupled One Stage Action Detection Network
Figure 3 for DOAD: Decoupled One Stage Action Detection Network
Figure 4 for DOAD: Decoupled One Stage Action Detection Network
Viaarxiv icon

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

Add code
Mar 30, 2023
Figure 1 for PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
Figure 2 for PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
Figure 3 for PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
Figure 4 for PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
Viaarxiv icon