Picture for Zhenzhen Hu

Zhenzhen Hu

Rebalancing Contrastive Alignment with Learnable Semantic Gaps in Text-Video Retrieval

Add code
May 18, 2025
Viaarxiv icon

Concept Drift Guided LayerNorm Tuning for Efficient Multimodal Metaphor Identification

Add code
May 16, 2025
Viaarxiv icon

VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection

Add code
May 05, 2025
Viaarxiv icon

PhysioSync: Temporal and Cross-Modal Contrastive Learning Inspired by Physiological Synchronization for EEG-Based Emotion Recognition

Add code
Apr 24, 2025
Viaarxiv icon

Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA

Add code
Apr 08, 2025
Viaarxiv icon

Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation

Add code
Dec 10, 2024
Viaarxiv icon

Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval

Add code
Oct 09, 2024
Figure 1 for Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval
Figure 2 for Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval
Figure 3 for Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval
Figure 4 for Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval
Viaarxiv icon

UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos

Add code
Sep 10, 2024
Figure 1 for UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos
Figure 2 for UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos
Figure 3 for UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos
Figure 4 for UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos
Viaarxiv icon

Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

Add code
Sep 09, 2024
Viaarxiv icon

Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering

Add code
Oct 27, 2023
Viaarxiv icon