Picture for Wanlong Fang

Wanlong Fang

Hierarchical Semantic-Augmented Navigation: Optimal Transport and Graph-Driven Reasoning for Vision-Language Navigation

Add code
Jun 01, 2026
Viaarxiv icon

Turing Patterns for Multimedia: Reaction-Diffusion Multi-Modal Fusion for Language-Guided Video Moment Retrieval

Add code
Jun 01, 2026
Viaarxiv icon

Towards Understanding Modality Interaction in Multimodal Language Models via Partial Information Decomposition

Add code
May 31, 2026
Viaarxiv icon

Immuno-VLM: Immunizing Large Vision-Language Models via Generative Semantic Antibodies for Open-World Trustworthiness

Add code
May 29, 2026
Viaarxiv icon

Annotations Are Not All You Need: A Cross-modal Knowledge Transfer Network for Unsupervised Temporal Sentence Grounding

Add code
May 29, 2026
Viaarxiv icon

SLAP: The Semantic Least Action Principle for Variational Video-Language Modeling

Add code
May 29, 2026
Viaarxiv icon

Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval Using Language

Add code
May 28, 2026
Viaarxiv icon

Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language

Add code
May 28, 2026
Viaarxiv icon

CogniVerse: Revolutionizing Multi-Modal Retrieval-Augmented Generation with Cognitive Reflection and Geometric Reasoning

Add code
May 28, 2026
Viaarxiv icon

Rethinking Video-Language Model from the Language Input Perspective

Add code
May 27, 2026
Viaarxiv icon