Picture for Dan Guo

Dan Guo

Motion is the Choreographer: Learning Latent Pose Dynamics for Seamless Sign Language Generation

Add code
Aug 06, 2025
Viaarxiv icon

CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization

Add code
Aug 06, 2025
Viaarxiv icon

Learning Speaker-Invariant Visual Features for Lipreading

Add code
Jun 09, 2025
Viaarxiv icon

EmoSEM: Segment and Explain Emotion Stimuli in Visual Art

Add code
Apr 22, 2025
Viaarxiv icon

Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering

Add code
Apr 15, 2025
Figure 1 for Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering
Figure 2 for Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering
Figure 3 for Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering
Figure 4 for Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering
Viaarxiv icon

SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement

Add code
Apr 07, 2025
Viaarxiv icon

A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli

Add code
Mar 20, 2025
Figure 1 for A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
Figure 2 for A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
Figure 3 for A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
Figure 4 for A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
Viaarxiv icon

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Add code
Feb 11, 2025
Viaarxiv icon

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring

Add code
Jan 16, 2025
Figure 1 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Figure 2 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Figure 3 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Figure 4 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Viaarxiv icon

Linguistics-Vision Monotonic Consistent Network for Sign Language Production

Add code
Dec 22, 2024
Viaarxiv icon