Picture for Shiwen Cao

Shiwen Cao

MCAF: Efficient Agent-based Video Understanding Framework through Multimodal Coarse-to-Fine Attention Focusing

Add code
Apr 24, 2025
Viaarxiv icon