Picture for Rong Shen

Rong Shen

MASR: Self-Reflective Reasoning through Multimodal Hierarchical Attention Focusing for Agent-based Video Understanding

Add code
Apr 28, 2025
Viaarxiv icon

MCAF: Efficient Agent-based Video Understanding Framework through Multimodal Coarse-to-Fine Attention Focusing

Add code
Apr 24, 2025
Viaarxiv icon