Picture for Jianqin Yin

Jianqin Yin

MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection

Add code
Sep 11, 2024
Figure 1 for MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection
Figure 2 for MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection
Figure 3 for MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection
Figure 4 for MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection
Viaarxiv icon

SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation

Add code
Aug 09, 2024
Figure 1 for SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
Figure 2 for SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
Figure 3 for SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
Figure 4 for SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
Viaarxiv icon

ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality

Add code
Jul 29, 2024
Figure 1 for ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality
Figure 2 for ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality
Figure 3 for ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality
Figure 4 for ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality
Viaarxiv icon

Micro-expression recognition based on depth map to point cloud

Add code
Jun 12, 2024
Viaarxiv icon

CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering

Add code
May 13, 2024
Figure 1 for CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Figure 2 for CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Figure 3 for CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Figure 4 for CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Viaarxiv icon

DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation

Add code
Apr 27, 2024
Figure 1 for DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation
Figure 2 for DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation
Figure 3 for DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation
Figure 4 for DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation
Viaarxiv icon

OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation

Add code
Apr 25, 2024
Figure 1 for OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation
Figure 2 for OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation
Figure 3 for OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation
Figure 4 for OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation
Viaarxiv icon

Towards more realistic human motion prediction with attention to motion coordination

Add code
Apr 04, 2024
Figure 1 for Towards more realistic human motion prediction with attention to motion coordination
Figure 2 for Towards more realistic human motion prediction with attention to motion coordination
Figure 3 for Towards more realistic human motion prediction with attention to motion coordination
Figure 4 for Towards more realistic human motion prediction with attention to motion coordination
Viaarxiv icon

Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection

Add code
Jan 10, 2024
Figure 1 for Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
Figure 2 for Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
Figure 3 for Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
Figure 4 for Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
Viaarxiv icon

Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based Human Action Recognition

Add code
Jan 09, 2024
Viaarxiv icon